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1  Introduction 


A  roguitivc  approarli  to  laiignaf^c  atiks  both  mprcanitationa]  and  computational 
qncstiona.  Otu  aim  in  onr  recent  work,  snmmarixed  in  The  Grammatical  Basi$ 
of  Linguintie  Performance  -  is  to  discover  botli  what  otir  knowledge  of  language 
is— a  question  about  representation-  and  how  that  knowledge  is  put  to  use — a 
question  about  com]>ntation.  We  argued— and  we’ll  reinforce  that  argument 
here-  that  we  c:ut  gain  a  deqier  tmderstaiiding  of  why  natural  langimgcs  are 
built  the  way  they  are  Ity  considering  how  the  problems  of  eifirient  parsing 
and  learning  coimect  to  the  representation  of  grammars.  We  sIiowmI  that  if 
one  is  willing  to  mak«‘  a  ftsw  strong  but  natural  iissumptions  about  constraints 
on  human  ptirsing  abilities  <uid  how  grammars  are  used  as  parsers,  then  one 
ciui  show,  in  part,  why  lucidity  constniints  like  Snbjaccucy  must  be  a  ]>art  of 
grammatical  descriptions.  Our  assumptions  were  these: 


•  Parsing  is  deterministie,  in  the  sense  that  once  information  about  the 
structure  of  a  sentence  is  written  down,  it  is  never  retracted.  This  means 
that  the  information  about  a  sentence  is  monotonically  preserved  during 
analysis. 

•  Grammatical  representations  are  embedded  directly  into  parsers,  without 
intervening  derivisl  predicates  or  midtiplicd-ont  rule  systems.  This  is  an 
assumption  of  trantpareney  (Dexwick  and  Weinberg  1884). 

s  The  human  brain  is  finite. 

The  assumi>tions  about  determinism  and  tran.sparency  are  strong,  but,  as 
we’ll  sec,  natural.  They  arc  meant  to  be.  Our  exphuiatory  punch  works  in 
direct  proportion  to  the  strength  of  the  constraints;  if  we  adopt  a  system  where 
tuiythiiig  goes,  then  we  caiuiot  exphun  why  languages  arc  buUt  one  way  rather 
thiui  another. 

Naturally  -  iuid  fortunately-  this  leaves  the  system  of  assumptions  open  to 
refutation.  In  a  recent  article  to  appear  in  Language  and  Cognitive  Processes 
(1985),  .J.'uiet  Fodor  t.'ikes  ia-uio  with  both  the  liuguistir  details  behind  the  the¬ 
ory  of  gramiiiar  we  adopt  and  with  the  assumptions  of  iiionotoiiicity  and  trans¬ 
parency.  We  believe  that  each  of  these  criticisms  falls  short,  and  we’ll  survey 
just  what  Fodor  says  as  well  as  mir  own  position,  but  before  launching  mto  a 
bill  of  particulars,  it’s  worthwliile  to  step  back  and  survey  the  approach  Fodor 
implicitly  endorses. 

There’s  a  style  of  theory  construction  in  A.I.  that  might  be  dubbed  “nniver- 
Mil  simulation.”  The  idea  is  to  adofit  the  weakest  possible  set  of  assumptions 
about  a  computational  process,  for  fear  of  being  wrung.  A  lampoon  version 
goes  something  like  this;  (i)  ovtTy  cognitive  process  is  a  computatioiiiU  pro¬ 
cess;  (ii)  Turing  nnicliincs  con  siniulate  any  compiitatjonnl  process;  so  (iii)  I’d 
better  ado|>t  a  Turing  nnwliiiic  ns  a  nimlel  of  this  cognitive  process,  bi'causc 
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otherwise  I  may  tuiss  soiiiethiiig.  That's  she«T  liyperlade,  of  roiirse,  but  soiue* 
thhig  disturbingly  clom;  to  this  lies  Im'IuiuI  the  eiubrare  of  iiundetcriniiiism  as 
a  central  feature  of  parsing  models.  The  problem,  as  we  sptYificidly  observe 
in  our  book  and  ns  Fwlor  echoes,  is  that  sunre  nondeteriniuistic  ronipiitatian 
siilMUines  detenniiiistic  computation,  one  ran  always  simulate  the  effect  of  the 
deterministir  assniiiptiou  .simply  by  making  the  cost  of  nondeterminism  very 
high.  What  Fodor  fails  to  note  Is  the  flip  side  to  this  point:  one  can  always 
get  tin'  fuiirtioiia]  effect  of  recov(*ry  fnmi  hiileil  detenniiiLsm,  such  as  garden 
pjiths,  liy  adding  recovery  procedures  to  deterministic  juirsers.  So  why  all  the 
fuss?  Don't  these  two  apparently  oppoMsl  cainiw  just  merge  into  a  gray  middle 
ground? 

The  difference  Is  one  of  point  of  view  luid  meth(Mlok>gical  stance.  Forcing 
an  esseutiidly  nondetcniiinistic  procedure  to  be  deterministic  by  adding  cost 
to  backup  violates  the  spirit  of  nondetenuiuistic  computation  precisely  in  the 
.siiiiie  way  that  arbitrary  backtracking  would  violate  the  spirit  nf  <leterniinism. 
We  prefer  to  make  the  stronger— and  mure  refutable  -hypotheses  about  trans* 
piircncy  and  determinism.  We’d  argue  that  recovery  from  garden  paths  and 
near  garden  paths  need  not  cause  a  deterministic  parser  to  throw  up  its  hands, 
but  invokes  quite  partiridar,  non-ad  hoc  recmistmction  jwoccdures  that  use  the 
infontiatioii  built  up  about  the  parse  tn  a  dcterminintie  way.  More  about  that 
later.  Tlie  important  point  here  is  that  we  adopt  the  determinism  requirement 
as  a  basic  article  -a  “leading  idea,”  to  be  weakened  only  under  duress  and  in 
quite  limited,  particular  cases,  hi  contrast,  based  on  the  same  evidenec,  Fodor 
adopts  nondeterininisni  as  a  Icailing  idea.  Tlicse  different  positions  lend  to  quite 
different  ways  of  flunking  about  parsing.  For  someone  who  endorses  nondeter- 
niiiiism,  the  hard  {lart  isn’t  figuring  out  how  imr.sing  gets  done  -  that’s  eatier, 
because  we  have  more  machinery  at  nor  disposal  the  hard  part  is  figuring  out 
what  the  constraints  are  and  how  to  naturally  enforce  them.  We  must  now  be 
able  to  say  why  {KU-siiig  isn’t  done  some  other  way  that  is  just  as  easy  to  en¬ 
code  using  the  extra  iiiiu:hinery  of  nondetenuiiiismi.  Plainly  the  burden  of  proof 
here  fidls  on  Fodor’s  slioulders;  her  position  is  the  weaker  one.  One  example 
of  :liis  point  should  suffice.  Fodor  argues  that  adding  an  extra  memory  cell  or 
its  fiiiictional  equivalent  to  a  tnuisitkm  network  parser  (e.g.,  a  liohl  cell)  mokes 
])arshig  easy.  Therefore,  site  concludes,  it  sliould  be  added.  More  strikingly, 
she  comments:  “D[erwick]  and  W[eiiiberg]  simply  imve  to  stipulate  that  their 
parser  has  no  such  facility.”  (page  50;  our  einpliasis).  Dot  since  when  dues 
one  have  to  stipulate  the  nonexiatener  of  additkina]  machinery?  As  Marcus 
(1980:146)  says  on  tliis  point,  “What  demands  exphuintioti  and  motivation  is 

why  a  given  facility  ia  included  in  the  model - Thus,  there  is  no  reason  to 

explain  why  a  mechanism  of  only  limited  power  has  been  imphvnented  if  it  can 
be  shown  that  it  is  enough  tn  the  job  tlint  is  rcqniri'd.”  What  is  more,  by  stick¬ 
ing  to  more  restricted  inncliinery,  we  can  actually  ex]>lnin  some  of  the  structural 
cluiracteristics  of  naturt-d  languages. 
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Of  couwo  our  Iciuling  ith'ji  may  be  incorrect.  Then  we  wiB  be  lc<i,  regret¬ 
tably,  to  noudeteriuiiiiHm,  to  nontninitpnrenry,  luiil  perhnps  beyond.  We  say 
regrettably,  becanw  then  we  will  be  in  a  wesiker  position.  Once  tlie  Pandora’s 
box  of  uiiliniited  iioudeteniiinistic  computation  is  opened,  wc  can  nail  it  shut 
only  by  unporting  constnunls  from  other  domains.  Again,  this  may  be  possible; 
we  cannot  rule  it  out.  Fodor  hints  at  constraints  on  grammar  sise  luwuig  to 
do  with  parsing/leariiability  lint  we’ll  see  these  lu'gumeiits  lack  support.  Sim¬ 
ply  ]int,  the  sinirch  space  of  uondetemiinistically-  and  nontraiisparently-bascd 
theories  is  much  vikster.  We  preft^r  to  start  with  the  iiinch  smaller  world  of 
determiiiisin  and  work  outwards. 

We  were  well  aware  of  this  diflicnlty  in  our  book.  That’s  why  wc  took  great 
pains  to  distinguish  b<'twe<‘ii  two  vmiouaof  itoiuleterminisiii;  (i)  ‘Hruc”  nonde- 
b-riniiiisin  in  imrsiiig,  where  all  mterpretatious  are  carried  along  simidtanroiisly; 
and  (2)  “backtracking*’  noiideterminism,  where  all  imndeterininistic  rdtematives 
arc  explored  one  at  u  time.  We  carefully  observed  that  our  functional  argu¬ 
ments  bifurcating  d<'t(;rministic  and  uondetcrministic  parsing  applied  only  to 
true  nondetermini.sm.  By  tliuiking  about  this  contrast,  we  were  led  to  quite 
specific  predictions  about  locality  constraints  in  natural  huiguagcs—  predictions 
that  are,  ns  wc  show  in  our  book  and  as  we’ll  tmderscorc  below,  confinned. 

Tliis  much  said,  we  cun  turn  to  Foilor’s  jiarticular  objections.  As  wc  noted 
earlier,  they  fall  into  two  {larts:  objections  to  our  predictions  about  which  con¬ 
structions  will  obey  Subjncency  and  which  wUl  not;  and  objections  to  our  three 
key  aasumptions.  As  to  the  first  set  of  objections,  we’U  sec  that  while  Fodor’s 
more  refined  observations  about  what  constructions  obey  Subjacency  and  what 
ones  do  not  are  correct,  tiny  in  fact  support  our  “leading  idea”  of  determinism. 
The  s<'cond  set  of  obj<H:tions  center  on  the  assumptions  of  determinism  and  its 
relationshiji  to  <'fficicHt  p,irs,ibility,  our  ‘^nodular”  parser  design  and  the  di¬ 
rect  embedding  of  grammatical  representations  in  the  parser,  and  the  restricted 
8pa<'e  for  writing  down  grammatical  operations. 

2  Determinism  makes  the  right  grammatical 
predictions 

Turning  first  to  the  grammatical  predications  of  our  model,  Fodor’s  interest¬ 
ing  critique  argne.s  that  our  approach  is  both  too  strong  and  too  weak.  It  is 
too  strung  in  that  our  approivch  predicts  parasitic  gaps  to  be  subject  to  Subjap 
cency.  TIu.m  is  beennse  their  deterministic  detection  requires  scanning  the  left 
context.*  Nonetheless,  we  claimed  that  the  dbtributioii  of  these  categories  was 

'To  hIhiw  Fn<lor  citrs  rKniiiplrii  where  in  onlrr  to  know  whether  mi  mfirmet  claiiK  with 
lui  aiiil)ifeioii!i  verb  c«i  tiihe  h  (inriwitic  gap  object,  we  must  ire  whether  the  inatrix  clauae 
cuiitoiiis  »  tiA  clciiiciit  iu  COMP.  The  relevant  cxaniplca  are  coiitrastcfi  in  (a)  and  (b): 

(n)  What  did  you  cook  without  eating? 


n»t  govrrnccl  by  Snbjacency. 

Further,  our  approach  in  too  wMik  because  it  cannot  distinguish  a  subset 
of  gapping  constructions  tltat  Fodor  allows  obc^  locality  from  a  class  that  does 
not.* 

First,  we  will  show  that  Fodor’s  criticisma,  while  correct,  deal  with  non* 
crucial  assuiiiptioiis  of  our  analysis.  The  assumptions  that  repLicc  them  arc 
billy  conj]Nitible  with  our  theory  and  the  data  citi^d  by  Fodor  actually  support 
our  analysis  in  interesting  ways.* 

2.1  Parasitic  gaps 

The  most  important  thing  to  notice  about  our  claim  that  parasitic  ga|>s  arc  not 
suliject  to  Snbjacency  is  that  it  is  fals<^  Chomsky  (class  lectures,  1084)  provides 
the  following  examples  showing  that  these  constructions  arc  in  fact  subject  to 
this  constraint: 


1.  Who,-  did  your  read  a  book  about  e^  to  e,-? 


2.  Which  man,-  did  you  interview  c,-  without  rending  up  on  ei? 

*3.  Whicli  man,-  did  you  interview  c,-  without  reading  [np  [the  file],-  [s  you 
made  cy  on  e^]]? 

In  (1),  both  gaps  arc  subjacent  both  from  the  complcmcntiKer,  and  from  each 
other.  This  b  shown  by  both  (4)  and  (5),  where  overt  movement  from  both  the 
Iiarositic  and  regular  gap  positions  is  acceptable. 

4.  Whoj  did  you  read  a  book  about  C{. 

5.  Who,-  did  you  read  the  book  (that  Mary  bought  yesterday)  to  e,-. 

(b)  Can  yon  watch  TV  without  eating? 

In  the  second  example,  eotiny  ii  onanibigaoiuiy  an  intrandtive  verb,  because  there  is  no  wk 
moveineut  in  the  matrix  clause. 

^Defore  turning  to  these  specific  rases,  let  ns  dispense  with  one  of  Fodor’s  more  general 
criticisms;  iininely.  since  the  solution  aiioptcd  does  not  solve  all  cases  of  piu-siiig  ambiguity, 
it  is  iliiliions  from  the  evohition.-iry  pcrsi>ectivc.  In  fart,  tliis  kind  of  compnmiue  is  typical 
of  what  one  finds  in  iintnrHl  selection.  Tlie  evtdntionnry  literntiirc  nbouuds  with  cases 
where  selection  has  opted  for  solutions  Ui.-tt  either  solve  part  of  an  evolutionary  problem 
or  rreateil  other  problems.  (See  footnote  10  of  Derwirk  mid  Weinberg  1082.)  Indeed 
Could  (1083)  cautions  us  ng.-iinst  adaplationists  who  tbeoriieil  “a  world  of  perfect  design, 
not  inueli  ditfcreiit  from  that  ‘concoted'  by  18th  century  uatuiid  theologians  who  ’proved’ 
Cod  s  existence  liy  the  perfect  .-tfchitccturc  of  organisms  ...  we  do  not  inhabit  a  perfected 
world  where  natural  srleetioii  nithles.-<ty  sernlinixcs  all  organic  structures  and  tlirn  molds 
tlieni  for  uptiino]  utility.”  (1083:155  150). 

®The  following  is  a  very  condensed  version  of  Weinberg  (fortbeinning). 
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Choiiinky  tisc^  thr  rontnLnt  in  (2)  aiiil  (3)  to  iirguc  tlmt  piirwitic  gaps  arc 
botiiifl  to  riii])ty  opcratois  and  arc  Ucit  only  if  they  arc  subjtirnnt  to  these 
operators.  Tlu'se  empty  operators  are  uitcrpreteil  as  marks  of  predication  and 
so  innst  appetu  at  the  liead  of  the  iuijnuct  clause.^  Put  in  terms  of  our  parsing 
model,  we  can  na«*  the  presence  of  tlie  <»vert  operator  to  signal  the  presence  of  the 
“reid”  ga]).  Tlie  plareiiu'iit  of  the  empty  operator  is  goveriiMl  liy  the  independent 
priiiciph»i  of  A  binding.  The  pn'seucc  of  the  cm]>ty  operator,  m  tnm,  can  be 
used  to  sigmd  the  jirescnce  of  the  {mrnsitic  gap,  if  it  is  in  subjacent  position.* 
In  addition,  ('lionisky  assumt's  that  the  theory  of  government  interacts  with 
the  tliMiry  of  iMiiinduig  in  that  only  ungoverned  nodes  count  for  bounding. 
Therefore,  wo  will  assiune  that  the  empty  operator  is  subjacent  to  the  real 
operator.*  This  amdy.sis  predicts  that  (3)  is  l>nd  bi'cause,  ns  a  sign  predication 
between  the  ridativc  clause  and  the  heiid  of  the  complex  NP,  tlie  empty  ojierator 
mside  this  relative  must  be  bound  to  (coindexed  with)  the  head.  Coiiidcxing 
the  ]iarasitic  gajt  to  this  ojicrator  as  well  will  result  in  an  ill-formed  stmeture, 
because  quantifiers  cannot  be  bound  to  two  variables,  as  in  (C).  Neither  the  overt 
operator  at  the  head  of  the  sentence,  nor  the  empty  operator  at  the  head  of  the 

^Alternatively,  follewiiig  Aoun  and  Clark  (1085),  we  can  claim  tlmt  empty  operator*  count 
a*  A  anaphora  and  mi  obey  the  locidity  condition*  that  apply  to  this  cla**.  Sec  Weinberg 
(forthcoiiiiug)  and  Aoiin,  Horuatein,  Lightfoot,  mid  Wciiilicrg  (forthcoming)  for  detaib. 

Thi*  contrnatii  with  Oionuky  (1082)  where  paraaitie  gap*  are  cemmdered  underlying  PRO*. 

’  ".'.ly  (1083)  provide*  independent  nrgnincuta  ahowing  thi*  account  of  the  diatrilnition  of 

-  '  'itic  gap*  i*  iiuulcquate  liccaiiae  it  relics  on  the  ao-callcd  jmctional  iefmilian  of  enqity 

■  .’ories.  In  addition,  the  earlier  anolyai*  would  obvioualy  not  predict  the  obaerved  dutii- 
iiiitiou  of  the  data,  since  PRO*  arc  typically  nut  liound  by  operators,  empty  or  otberwiae. 

"Chomaky  must  argue  that  off  ungovemed  node*  (not  just  NP  or  S)  are  bounding  with 
respect  to  Siibjaceiicy.  This  ia  because  he  want*  to  rule  out  direct  movement  from  an 
adjunct  as  in  (a): 

(a)  'Which  article  tfid  John  reoil  a  book  before  filing 

In  order  to  rule  this  out  using  Snbjacency,  he  must  claim  that  both  PP  and  S  count  as 
bounding  node*.  Moreover,  he  imut  iwc  Subjnccucy  to  rule  these  cases  out,  because  this 
is  the  only  S-stnictiire  condition  available  to  him  and  the  bimnding  constraint  in  these 
conatnictions  ia  on  S-structure  phenomenon,  as  shown  by  the  graromaticality  of  (b): 

(b)  Who  read  a  book  before  filing  which  article? 

In  Wcinlierg  (forthcoming)  and  in  Wahl  (forthcoming)  it  b  argued  that  the  requirement 
of  lexical  proper  govemiucnt  in  Chomsky's  ECP  actually  applies  a  the  level  of  phonetic  form 
(PF).  This  allow.s  n.*  to  rule  out  a  cose  like  (a)  by  ckiimingthat  the  trace  in  the  COMP  of 
the  adjunct  b  not  properly  governed,  os  sliown  in  the  stmeture  (c): 

(c)  *(g  Which  articlci  [did  John  read  a  book  |bcfore  |g  e^  [PRO  filing  Ci]]]] 

Therefore,  we  ran  maintain  the  position  that  only  3  and  NP  count  for  the  bound¬ 
ing  system.  Thus  the  empty  operator  is  subjacent  to  the  real  operator  in  parasitic  gap 
const  ructions. 


s 


atljtuirt  nxc  mbjarcut  to  tko  gap,  atui  nnt  1)147  CAiuto)  liceiuu^  it.  Thori'fciro  tliis 
stnictuTi*  is  mil'll  out.  Tliis  coutiauta  ■with  (2),  wbcri*  I'vt'ry  trace  is  subjacent 
to  the  operator  tliat  lirojisis  it,  as  sliowii  hi  (7). 

*6.  Wliieh  man,-  [s  <li«l  you  [vr>  hiterview  e,][pp  without  [0P2j  (s  PRO 

reailiiig  [np  the  file,  [s'  0P<  [s  that  ymi  maile  aaj  c, •)]]])]] 

7.  Wlio,-  [s  dill  you  [vp  hiterview  c,-]|pp  witliout  [s>  OP,  [PRO  reading  up 

on  e.])])? 

Tliiis  ill  fai:!,  Finlnr  is  rorre<-t  in  rlaitiiiiig  tliat  our  analysis  should  predict  that 
jianiKitir  gaps  are  govi-riii'il  by  Siibjaceiii-y  and  wi;  were  niistaki-n  wlien  we 
cliiiiiw'il  ill  our  book  tliat  it  diil  not.  But  we  were  all  incorrect  in  believing 
that  constraint  did  not  hohl.  Assuming  that  we  ciui  sliow  that  the  creation 
of  empty  opiTators  causes  no  problems  for  a  detenninistic  system,  we  can  use 
their  presence  to  lici'iise  parasitic  gaps  hi  the  appropriate  structures.  Thus  we 
can  make  the  parshig  iiiodi'l  predict  the  properties  of  this  construction  in  a 
straightforward  and  hidepi'udently  motivated  way.  It  is  important  to  note  at 
tills  point  that  we  art;  not  changing  assumptions  in  an  ail  Ikk  way  simply  to 
model  the  facts.  The  problem  with  our  first  attempt  was  that  we  did  not  follow 
the  logic  of  our  predictions  clearly.  The  moilel  actually  predicts  that  parasitic 
gaps  should  be  governed  by  Subjacency,  as  Fodor  notes  in  her  article.  In  the 
next  section,  we  will  show  that  the  model  is  nomad  hoc  hi  other  ways,  in  that  it 
or  something  like  this  model  is  needed  to  solve  a  gc'iicral  parshig  problem  that 
is  independent  of  the  determinism  issue. ' 

In  this  section,  we  present  an  algorithm  to  enrate  empty  operators  that  is 
also  rnni|)atible  with  a  deterministic  approach.  Note  that  the  case  of  empty 
in  adjuncts  Is  similar  to  tlic  case  of  fnctive  Noun  Phrases  citeil  by 
I'<'<'.iw  III  her  criticism  of  Marcus.  As  hi  factives,  the  presence  of  the  overt 
operator  makes  jiarasitic  gaps  possible  hi  ailjunct  positions,  but  it  docs  not 
m<ike  them  obligatory  hi  these  structures.  Consider  (8)- (10). 

8.  Who  did  you  meet  without  greeting. 

9.  Who  did  you  meet  witliout  greeting  him. 

10.  WIio  did  you  meet  without  clearing  the  rendeevous  with  security. 

In  a  case  like  (8),  the  parser  must  place  on  empty  operator  in  the  comple¬ 
mentizer  of  the  adjunct  plir.-UH’  in  orilcr  to  bind  the  empty  parasitic  obji'ct  of  the 
verb  greeting.  In  (0)  and  (fU)  by  contrast,  we  do  not  want  to  place  an  empty 
operator  in  this  iiositioii,  because  thf^e  is  no  parasitic  gap  in  the  iuljunct  for 
the  operator  to  bind.^  In  (9)  the  parasitic  gap  is  filled  by  a  jiroiioun  and  in  (1), 

^If  tlicsc  o|Hvators  are  availnhlr  ut  all  stazea  of  coniprcbrnrion  tlicn  the  fact  Miat  (be  empty 
operator  hnii  uo  variable  to  biiul  ahonid  moke  the  sFiitencc  as  hnil  as  (a): 


there  is  no  corresponding  gap  position  at  all.  Deeause  of  the  possibility  of  site* 
cessivc  cy  '  inovoineut  however,  the  parasitic  gap  can  be  indeKnitely  far  away 
on  the  SI.  >.cc  from  the  empty  operator  po.sition.  A  deterministic  parser  with 
limited  lookahead  will  not  be  able  to  wait  for  the  disambiguating  right  context.* 
Therefore,  there  will  be  certain  cases  it  will  incorrectly  place  an  empty  operator 
in  the  adjunct's  COMP. 

Fodor  iiu])lies  that  these  facts  pose  a  problem  solely  for  deterministic  parser, 
suggesting  that  a  nondeterministic  solution  is  called  for.  In  fact,  the  determin- 
istir/nondeterministic  issue  is  beside  the  point.  If  the  distinction  is  between 
a  deterministic  ]mrser  and  a  noudetenninistic  panser  that  Inicktracks  (Fodor’s 
choice),  tlien  botli  will  have  problems  Inxmiise  they  both  at  least  superficially 
predict  that  such  csises  cutisc  ptHtple  to  have  noticeable  diificidties  in  compre¬ 
hending  these  sorts  of  sentences.  Dut  none  of  (8)  (10)  are  difficult  to  under¬ 
stand. 

The  nondeterministic  parsers  with  backtracking  that  Fodor  cites  divide  cases 
of  possible  parser  error  into  three  types: 

(a)  CaBea  that  are  locally  ambiguouB  but  eauBC  the  parser  no  difficulty.  Here 
it  Is  claimed  that  either  the  backtracking  needed  to  transfonn  an  incorrect 
false  start  into  a  correct  un:dysis  is  so  minor  that  it  is  not  as.sociated  with  a 
comi>utationai  cost,  or  that  thcs«!  parsers  use  an  exact  analog  of  a  deterministic 
pmsi'r’s  local  buffer  solution  and  thus  always  make  the  right  choice.  Some 
examples  of  this  kind  of  case  are  given  in  (11). 

11a.  John  believes  Dill. 

lib.  John  believes  Dill  is  a  fool. 

Even  if  the  parser  mistakenly  hypothesised  that  the  subject  of  the  embedded 
infinitival  was  the  direct  object  of  the  verb  believe,  the  backtracking  needed  to 
insert  the  infinitival  S  marker  between  it  and  verb  is  minor  and  a  nondeter- 
niinistic  parser  might  be  able  to  correct  its  mistake  in  a  way  that  is  relatively 
cost-free.® 

In  contrjist,  there  are  cases  that  require  more  extensive  backtracking  over 
ossentiuliy  unboiimh'd  distances.  These  c.jwes  can  be  divided  into  two  types. 

(b) (7a«e«  for  which  people  rcyieter  a  Btrong  preference  for  one  of  the  poBcible 
(tnalyacB  (even  when  pragmatic  bi<ising  points  to  the  other  choice,  but  where 

(a)  Who  did  Jolm  meet  Mary? 

*Thc  requirement  th.it  looknlirnd  be  limited  ia  crucial  because,  os  Marcus  (1080)  notes,  a 
dftcriniiiistic  immcr  with  uulinuted  iookafacsd  could  well  turn  out  to  be  aide  to  sioinlste  a 
nondctcriiiiiiistic  machine. 

"Note  that  this  is  tnie  even  for  a  dctcniiiiiistic  parser,  since  we  need  only  odd  a  new  piece 
of  infonnatiou.  See  the  next  section  for  a  related  example. 


both  readings  are  rrveiitually  nvaihildc).  An  exiunple  of  this  rase  is  sliown  in 

(12) ,  where,  as  Fodor  mentions,  there  is  an  initial  preference  for  the  reading 
when;  who  is  taken  to  bt;  the  subject  of  an  embedded  clause. 

12.  Who,-  did  the  little  girl  beg  to  sing  those  stupid  French  songs  (for)  c,-? 

(c)  Cti»eB  of  eonseiouB  garden  paths  where  one  reading  is  difficult.  These  are 
cases  where  the  alternative  has  to  be  pointed  out,  even  if  it  a*  the  only  reading 
resulting  in  a  graiiiiiiatical  sent<-iice.  These  include  the  classic  sentences  as  in 

(13) : 

13.  The  horse  raced  past  the  barn  fell. 

The  pruces.sing  load  lu'rc  might  be  compatible  with  a  Wklnicking  approach 
if  it  is  assumed  that  backtracking  over  long  distances  is  computational  costly. 
(It  can  often  be  difficult  to  assess  these  elh-cts  ii;  a  luicktrackiug  model;  sec  the 
next  section.)  The  extra  burden  imposed  by  true  garden  paths  is  a  complex 
effect  that  is  ]iartly  lexical,  partly  structural,  and  exacerbated  by  distance  (in 
terms  of  number  of  alternative,  but  imcoiisidercd  pathways). 

Cast's  like  (8)  (1(1)  cau.se  problems  for  the  l>acktracking  approach  because 
they  break  the  association  between  the  extent  of  bticktracking  necessary  to  cor* 
rect  faLso  start.s  and  perceived  sentence  complexity.  None  of  the  cxmnples  in 
(8)  (10)  pnxlnce  processing  complexity.  This  shows  that  there  is  not  even  a 
preference  for  adjuncts  with  or  without  parasitic  gaps.  Whatever  the  first  hy¬ 
pothesis  of  the  (deterministic  or  backtriukiiig)  parser-  whether  it  inserts  an 
empty  operator  in  the  adjunct's  complement iscr  or  not  one  of  the  structures 
is  incorrectly  predicted  to  be  diffietdt  to  process  bt'causc  of  extensive  backtrack¬ 
ing  from  the  site*  of  the  disambiguating  parasitic  gap  or  end  of  the  adjimet 
iwed  -d  to  correct  the  mistake.  (14a)  mid  (Mb)  show  that  no  extra  processing 
i-oiii|)!exity  is  observed  even  in  cases  where  the  disambiguating  right  context  is 
very  liu  away  from  tlie  point  where  the  dwision  about  whether  to  insert  an 
lunpty  operator  must  be  mode. 

14a.  Who  did  you  search  for  without  telling  Sue  to  convince  Dill  to  ask 

Harry  to  come  with  you? 

14b.  Who  did  you  search  for  without  telling  Dill  to  ask  Sue  to  inform 

Harry  that  you  would  meet? 

It  seems  then  that  thcs<'  kind  of  sentences  are  problems  for  bolh  iloterministic 
mid  nondetermiiiistic  (backtracking)  ]iarsers.  We  could  .sidve  them  if  we  coidd 
design  mi  idgorithm  in  which  the  semantic  component  simply  ilidn’t  interpret 
empty  operators  unless  they  wi're  eventually  bound  to  elenu'iits  in  argument 
jiositions.  Since  th(»e  elements  have  no  phonetic  content,  if  thc-y  received  no 


M'tiliUltir  iiitcrj>rctiit ion,  it  would  hr  as  if  thi'si'  olcmniiH  never  existed.*®  In 
that  rase  we  could  insert  I  he  empty  operator  in  iJl  st'ntencc’S,  hut  we  would  be 
sure  to  he  ri}«lit  he<-ause  «ui  iinbouu<l  empty  operator  would  simply  he  ignored, 
hwaiise  it  is  invisihle.  In  fart  the  two  .stag<*  inirsing  iiiothd  dLsciiss<’d  in  our  book 
provides  just  such  a  nierlianism. 

W<'  argued  on  ron<-ej)tual  iuid  |)sycholinguistic  grimnds  that  the  natural  lan¬ 
guage  processor  wiu«  a  two  stag<‘  iiK-chaiiisiii.  The  first  stage  ileiilt  with  tree 
expansion  luid  the  s4-rond  de.dt  with  indexation,  bi  addition  to  having  a  dif- 
fer«'nt  function,  the  .second  stage  worked  on  a  «lifferent  repn  "iitation.  During 
the  first  stage,  tin*  coinph'tioii  of  a  category  sigmili'd  the  psirsi-r  to  shunt  the 
cat<'gory’.s  daughter  into  a  .s<-parate  stack,  which  we  calleil  the  Projiositional 
Node  Slack  (PNS).  The  intuition  bel'.'. <1  this  shunting  was  that  once  a  cabv 
gnrv'.s  thematic  role  Wiis  e.stablisl; ■  d  from  its  position  in  the  syntactic  tree,  the 
p.ir  <  r  wouldn’t  need  to  ndiiin  many  of  the  details  of  syntactic  structure.  We 
showed  that  elements  in  the  siune  c-comiiuuid  domain  are  not  put  in  the  PNS 
until  idl  categories  in  the  doniain  are  complete.  This  algorithm  allowed  the 
parser  lo  correctly  comiiute  c-conmumd  relations  between  categories.  This  was 
crucial  since  these  relations  govern  the  application  of  the  binding  operations 
on  the  previotisly  oxp,-uidcd  tree.  Pursuing  the  intuition  that  the  PNS  was  a 
n  pri'sentation  concerned  with  purely  semantic  aspects  of  the  interpretation,  we 
placed  a  stuiiantic  visibility  condition  on  the  categories  appearing  in  this  com¬ 
ponent.  We  claimed  that  to  he  interpretMl  by  the  semantic  component  (PNS), 
a  category  had  to  have  semantic  features.  These  were  the  features  tlial  allowed 
a  Noun  Phra.se  to  either  denote  an  imlividual  or  a  set  of  individuals  or  allowed 
a  quantifier  to  delimit  a  scope.**  Assuming  a  category  had  such  features  it 
would  be  given  a  “referential  index”  .uul  be  visible  in  the  PNS.  If  a  category  did 
not  intrinsically  hfivc  stich  features,  it  could  obtain  a  referential  uidcx  by  be¬ 
ing  linked  to  an  element  that  di<l.‘*  CJiven  the  shunting  procedure,  an  element 
would  have  to  be  in  the  s.'unc  c-coininand  domain  as  its  antecedent  in  order 
to  receive  a  referenlial  ind«'X  before  being  shunted  into  the  PNS.  If  an  element 
did  not  receive  an  index  before  shunting,  it  would  become  invisible  and  receive 
im  int<’ri)retatioii.  Tliis  allowed  us  to  j>rovide  a  principh'd  explanation  for  the 
fiu  l  that  grammatical  conditions  specifying  e-commanding  antecedents  seem  to 

'“An  alti’riialivr  would  ()bviull^<ly  br  to  roinc  ii]>  with  nii  iuialysM  Mint  did  not  posit  empty 
o|i<-rnt(irs  in  these  and  related  eves.  Snrh  an  ucroiiut  is  ditticnll  to  conceive  of,  because  we 
would  idso  have  to  account  for  tlie  sulijaceiicy  cITcrts  that  these  constructions  exhibit.  By 
this  we  do  not  mean  coming  up  wiUi  .an  .alternative  functional  ex)>hiiiatiiui  for  Subjaccncy 
in  tliese  cases.  We  mean  allowing  the  parser  (or  the  grammar)  to  distinguish  those  cases 
that  arc  griuninutiral  from  those  that  do  not  obey  the  constraint. 

' '  Ex.ainplrs  of  categories  with  intrinsic  scin.autic  features  arc  proper  unincs  like  John,  pro- 
nuiins  like  Urn  tuA  )>hrascs  like  what  or  which  man. 

'^(Uti'gorics  tli.at  have  no  intrinsic  sein.antir  features  and  so  can  receive  referential  indices 
only  by  linking  ive  boniid  luiaphors  like  each  other  nr  herietf,  cini>ty  NP  and  wh  traces,  and 
certain  noii-wA  quantified  expressions.  See  Weiiilicrg  (forthcoming)  for  details. 
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api>ly  only  to  catci'orirs  with  no  iiKlopciulcitt  rofrrrntiiU  status.*^  ('honiHky 
(1981  and  1984  ('hu<:4  Icrturos)  luis  associat  ioii  witli  a  thematic 

(th('ta)  role  is  also  a  necessary  condition  on  visibility  for  s<-innntir  interpretation 
rol<>s.  W<‘  will  adopt  (Ihoiiisky'a  sugg<sttioii  an<l  state  the  roinbined  condition 
on  visibility  as  follows. 

15.  (Visibility  Condition)  To  be  visible  in  the  PNS,  an  element  mtist 
be  fussociated  with  a  theta  role  (either  by  occn])yinK  a  theta  position  or 
binding  an  element  in  a  theta  )>osition)  and  must  have  referential  features 
(features  that  either  de.signate  iui  individual  or  3<!t  of  individuals  or  that 
delimit  a  range). 

VVe  will  now  show  that  the  inde)>endently  motivated  shunting  procedure  an«l 
visibility  conditions  give  an  <iccount  of  empty  oi>erators  that  explains  why  they 
cau.s'e  no  processing  difficulties. 

Let  us  reconsider  sentences  (8)-  (10).  In  (8),  the  parser  recognizes  that  part 
of  the  sentencf'  is  an  adjunct  phrase.  This  signals  the  possibility  of  a  parasitic 
gap  in  the  subsequent  structure.  The  pars<T  therefore  inserts  an  ejiipty  operator 
in  the  COMP  position,  fis  shown  in  (16): 

16.  Who,  did  you  meet  without  (g  OPy  . . . 

If  the  parser  .subsequently  finds  a  gap  position  in  a  stibjacent  domain,  it  can 
create  a  trace  iuid  bind  the  operator  to  it,  thus  associating  the  operator  with  a 
theta  position,  iis  in  (17). 

17.  Who,-  did  you  meet  c,-  without  [OP,-  [s  greeting  e,-]] 

Befor<'  shunting  into  the  propositional  node  stack,  the  operator  must  locate 
an  antecedent  in  the  c-('omimui<l  domain  with  a  rrfcrtntial  index.  If  it  does  not 
find  one,  then  m-ither  it  nor  its  trjice  will  be  interpreted,  because  even  though 
they  are  .issociated  with  a  theta  role,  they  are  not  ii.s.sociatcd  with  a  category 
that  (h'liniits  a  range.  In  this  ci»sc  the  overt  o])erator  who  is  present  in  the 
c-coinimuid  doni.iin.  so  both  the  empty  <iperaU»r  juid  the  trace  can  receive  the 
cat('gory's  referential  index  (i)  and  so  be  interpreted  in  the  PNS. 

(aunpare  this  to  (18).  In  (18)  below,  the  parser  will  also  detect  an  adjtmct. 
It  will  not  delect  an  overt  operator,  JUid  so  no  empty  oi>erator  will  be  cre¬ 
ated  Since  there  is  no  empty  operator,  no  piirnsitic  gaj)  will  be  created  in  this 
structure. 

18.  Did  you  watch  the  movie  without  [g  OP,-  [g  eating  ]] 

n«  rwick  .ui,]  \V.  ( 1984,  jip.  173  182)  fui  llir  voiu-cptiial  firgniiiciit,,  amt  WciiibcTg 

(forHicoiaiiiK)  aixt  WciiilxT);  luiH  Carrrtt  (furtliroiiiiii);)  for  |)syrhf>liiig<iistic  results  and 
aaUlitioiial  coii,<r(|ii<  ikcs  of  tliia  approarli. 
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In  riiS(‘a  like  (9)  and  (10)  ahnve,  the  adjunct  mid  overt  operator  agiiin  triggers 
the  creation  of  an  empty  operator.  Since  there  is  no  gaji  in  the  odjiiiirt  phrase, 
the  operator  Is  not  associated  with  a  theta  role.  Therefore,  even  though  there  is 
an  overt  operator  to  link  with,  the  empty  operator  dotis  not  miH^t  the  criterion 
for  vi.sibilily  at  PNS  mid  .so  is  not  interpreted.*'*  Since  einjity  operators  arc  not 
iiiterjireti'd  unless  both  conditions  on  visibility  are  met,  a  detenniiiistic  parser 
can  always  create  these  categories  biH'ause  they  cmi  never  force  it  to  simulate 
nondeterininism  either  by  backtracking  or  parallelism  in  order  to  correct  for 
past  mistakes.  Note  that  this  sobilioii  will  only  work  for  empty  ojierators.  Lex* 
ically  siiecified  <-lements  will  r<x;eive  a  phonetic  interpretation  but  no  .semantic 
inter]>retation,  a  situation  that  will  leml  to  uuacxejitability.  An  (uiipty  element 
with  no  si'iiiantic  features,  however,  is  neither  seinmiticidly  nor  phonetically 
interproteil  mid  .so  .simjily  plays  no  role  in  the  interpretation  of  the  sentence.*® 

The  astute  reader  wiU  have  noted  mi  apparent  problem  created  by  this  so¬ 
lution.  Why,  one  might  ask,  if  empty  categories  can  become  invisible  at  later 
stages  of  interprct.ation,  must  wc  cue  their  creation  to  the  presence  of  overt  op- 

’^This  apjironch  will  niso  handle  empty  operatora  in  toufk  miiycnicnt,  topicaKfation,  iclative 
ciatisea,  iuid  the  hictivr  NPa  tlint  Podor  diticiisaes  in  her  criticism  of  Marcua.  As  should  be 
obvious,  since  all  these  structures  also  involve  predication  between  a  phrase  and  a  head, 
topic,  or  iuljcctivc  phrase,  exactly  the  some  logic  applies.  See  Weinberg  (forthcoming)  for 
det^s. 

Throughout  this  account,  wc  have  assumed,  contra  Chomsky,  that  the  empty  operator  is 
subjacent  ot  he  real  operator.  However,  this  assumptinu  is  not  cnicial,  and  remains  to  be 
vcriiieil  (<ir  falsiHctI)  liy  some  fairly  aulitle  empirical  facts.  To  show  tliis,  let  us  iissumc  (with 
Oioinsky)  that  empty  operators  ore  not  in  fact  subjacent  to  real  operators.  Then  we  must 
predict  that  the  possible  presence  of  an  empty  operator  is  queued  solely  by  the  presence  of 
the  adjunct  structure.  So  in  a  case  like  (a), 

(a)  Did  yon  catch  a  fisli  without  eating? 

the  parser  couldn't  mistakenly  output  a  structure  like  (b): 

(b)  Did  you  catch  a  fish  |pp  without  |  OPy  (PRO  eating  ey]|| 

The  empty  operator  and  parasitic  gap,  having  no  referential  indices,  would  disappear 
from  Ihe  semantic  component's  representation.  However,  the  case  fe.aturcs  on  the  parasitic 
gup  would  make  it  visible  in  PF.  In  fact,  some  speakers  report  an  initial  bias  towards 
treating  eut  ns  a  transitive  verb  in  these  structures,  and  thus  say  that  the  sentence  sounds 
unacceptable.  This  bias  interestingly  does  not  cross  over  to  structures  where  tills  verb  is 
not  in  on  iuljnnct; 

(c)  Did  yon  think  that  Horry  told  Mary  that  he  expected  to  cat? 

If  these  sentences  reflect  true  biases,  then  an  algorithm  based  on  Chomsky’s  definition  of 
Subjaceiiry  would  seem  iiKirc  appropriate.  Such  on  accouut  would  be  fully  compatible  with 
our  appnnieh  at  the  conceptual  level.  We  have  noted  cases  in  our  book  where,  in  order  to 
be  sprriiiatilr  using  terms  lirrnsrd  by  the  grammar,  the  Sniijarcncy  condition  is  in  some 
sense  ‘stricter'  tlnui  the  fMU'ser’s  needs.  Here  wc  have  a  case  where  a  parser  whose  rules 
are  written  using  tlie  grammar's  pnslicates  will  sometimes  make  mist.akes.  The  prediction 
is  that  peojile  will  make  Hie  same  mistakes.  Hie  fuels  here,  however,  are  quite  subtle,  and 
since  either  .'dternntivc  is  coiiipa'ible  with  mir  iqiproacli,  we  leave  the  question  of  whether 
to  place  the  Siihjiu'eiicy  re<|nireiiMiits  on  tiic  empty  operator  open. 
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crntnrs?  Tlir  rasos  that  iiKilivatcil  thr  arcontit  in  the  first  place  were  thoac  in 
which  the  local  8nl<cate|>(nrixatioii  of  a  verh  was  iii(lrtt*niiiiiate.  Dcforc  positing 
an  eni]>ty  chmeiit  iiTter  such  a  verh.  we  chiiiiieil  that  we  luul  to  make  sure  that 
iui  iictiiid  operator  was  present  in  the  previously  luialyxnl  structure.  However, 
givi'ii  our  ]>r«'S('nt  ap]>roiu'li,  one  might  he  tempted  to  argue  that  if  a  verh  that 
ran  he  optionally  transitive  turns  out  to  he  list'd  intransitively  in  a  given  struc¬ 
ture,  the  gap  will  simiily  not  he  aastu-iated  with  an  operator  and  so  liecomc 
invisihle  in  the  PNS.  This  si'eiiis  to  dash  the  iiiotivation  for  restrictions  on  left 
context,  crucial  for  tlu-  fuiictiotiid  motivation  of  Siihjacency  in  the  first  jdace. 
Diit  it  Ls  only  elements  with  no  phonetic  features  that  can  rsitape  uiiiu'ceptahility 
if  they  are  not  s<-mantically  iiiterpri'ted.  Since  wh  elements  have  case  features,** 
they  will  he  visible  in  the  phonological  component.*^  This  makes  certain  pre¬ 
dictions  about  till'  applicability  of  Subjaceny  to  NP  movenient.  As  noted  in 
Lasiiik  and  S.iito  (1984),  all  the  ca.s(«  where  we  si'em  to  iimhI  Siibjaceiiry  to  rule 
out  iimutceptablc  NP  movements  are  actiudly  also  ruled  out  redundantly  by  the 
Empty  Category  Principle.  Under  our  apjiroarJi,  we  predict  that  NP  movement 
should  not  be  governed  by  Subjacency,  thus  ruling  out  this  redundancy,  always 
a  welcome  result.** 

Looking  at  the  distribution  of  ]iarasitic  gaps  from  the  parsing  perspective 
allows  us  to  supplement  Chomsky’s  analysis  in  important  ways.  It  allows  us 
to  derive  the  fact  that  {mrasitic  gaps  mutt  be  licensctl  at  S-stnicturc.  That  is, 
we  di'rhre  as  a  thi'orem  tin-  fart  that  quantifiers  and  wh  operators  that  move  to 
COMP  or  some  other  pre-S  position  at  LF  do  not  create  acceptable  parasitic 
gap  structures,  as  shown  tiy  examples  (19a)  and  (19b). 

*19a.  [s  You  [vr  [vp  *net  who,)  (pp  without  greeting  c,-]]] 

Oiotiitiky  (1081)  f<ir  justification  of  this  owumption. 

'^Sec  Aoiin  and  Lightfoot  (1084)  for  diicuMion. 

'*Sc<  Wcinlicrx  (fort)icoiiiiiii;)  for  details.  Note  tliat  the  nou-xovemiiirut  of  NP  movement 
by  Siibj.icrncy  ranfiirrca  fUc  point  iiuule  in  flerwick  and  Weinberg  (1084)— namely,  that 
Subjiu-i'ney  governs  a  iiatiiral  class  from  tlie  parsing  persperlive.  Tlic  example  just  given 
shows  that  Subjacency  only  governs  a  jutnet  eg  the  movement  constructions,  the  gajiping 
examples  discussed  hater  on  in  this  section  sliow  that  Siibjnceiiry  governs  n  nhet  of  the 
deletion  cinistnictions.  From  a  griunnintieid  viewpoint,  this  is  on  entirely  imnatiirnl  result. 

Tliis  approach  also  makes  sense  of  some  preliminary  results  reported  by  Fkaiicr  (1084 
Nels  conference)  and  cited  by  Fodor  in  her  article.  Frasier  elnims  that  eye  movement  tasks 
suggest  that  subjects  try  to  fill  ga|is  using  operators  that  ore  not  subjacent  to  them,  if  the 
verbs  governing  the  gap  position  are  strongly  subcategoriied  for  direct  objects.  The  case* 
arc  like  those  in  (a); 

a.  *What,  did  [the  girl  |s  who  won  e,  receive  c^j 

Given  our  n|>prnach  we  might  claim  that  the  gap  inside  the  island  is  created  on  the  basis 
of  the  empty  operator  in  the  GOMP  of  the  relative  COMP.  The  fact  th,at  subjects  seem  to 
look  bock  to  the  overt  w/i  elcinent  is  comimtible  with  our  approach  if  we  chum  that  this  is 
the  result  of  tlie  attempt  to  bind  Uiis  t^ierator  (on  opertition  no,  govcnicil  by  Subjacency) 
to  the  overt  operator. 


*1%.  (Evcryoiio  [vr  [vp  oomcono,  ){pp  witlioiit  greeting  e,]]] 


Ci-.vw**- 


We  know  iiidopriidriitly  that  that  {Nira^itic  gap  roiistrurtiona  arc  not  licit  in 
the  real  gap  orrim  in  Sulijcct  position.***  In  addition,  if  oiir  analysis  is  correct, 
the  overt  operator  niu.st  urenr  in  a  r-cuiiiuiiuiding  (lOMP.  As  incnitioncd,  the 
c-conimiutd  re<iiiireiiient  is  ensured  hy  the  shnnliiig  design  of  the  jmraer.  If  an 
element  does  not  c>ro]iiin>uid  a  category  it  is  not  visible  to  it  and  so  cannot 
he  Jiseil  to  cr<!ate  that  category  as  wo  exp^uid  the  parse  tree.  Neither  the  wk 
eleiueiit,  nor  the  (juantifier  in  (i9a)  or  (19b)  c-coiuinands  the  niljnncts  contain* 
iiig  the  panisitir  gaps.  (liv<‘ii  the  above  aecoiuit,  there  wiU  be  no  binder  to 
give  referential  featiin's  to  the  empty  0|>erator  in  the  COMPs  of  these  adjuncts 
aiul  thus  neither  they  nor  their  traces  will  be  interpreted  in  the  PNS.  Given 
that  the  input  for  parsing  decisions  is  the  S-strnctnrc  of  the  sentence,  the  subsc* 
(pu-iil  iiiovciiient  of  a  category  to  a  c-coininauding  position  at  a  post  S*structure 
level  cannot  help  the  parser  decide  how  to  expand  the  parse  tree.  Our  pars¬ 
ing  theory  can  derive  both  the  fact  that  Subjacency  is  an  S-structiirc  property 
and  the  Snbjaecnt  government  of  parasitic  gaps  along  with  their  licensing  at 
S-strnctnre  -  the  rentral  propertna  of  the  construction. 


Gapping  constructions 


Fodor’s  next  criticism  detUs  with  our  analysis  of  gapping.  She  is  correct  in  claim¬ 
ing  that  our  treatment  does  not  distinguish  the  subset  of  gapping  constructions 
that  obey  bounding  conditions  from  those  that  do  not.  As  she  points  out,  es¬ 
cape  from  bounding  correlates  with  the  appearance  of  an  auxiliiiry  marker  in 
the  pregap  position.  (20)  and  (21)  illustrate. 


20a.  Mary  fishes  in  the  ocean  and  Harry  in  the  sea. 


*20b.  Mary  fishes  in  the  ocean  and  I  think  Harry  in  the  sea. 


21a.  Mary  has  fished  in  the  ocean  and  Harry  has  in  the  sea. 


21b.  Mary  has  fished  in  the  ocean  and  I  think  Harry  luis  in  the  sea. 


In  our  previous  mudysis  we  claimed  that  bounding  was  expirted  in  gapping 
constructions  because  the  complements  of  the  gapped  verb  had  to  be  correctly 
attached  in  the  VP  iiitcnial  or  extemivi  position.  Corret  t  attachment  depends 
on  the  prop<’rtit’s  of  the  verb.  Since  an  overt  verb  is  not  available  to  direct 
the  piirser  in  a  gapped  constituent,  we  predicted  that  iletermiiiislic  attachment 
of  these  complements  required  a  look  at  left  context  (some  previous  coiyunct 


'®Scc  ('hoinsky  (1082). 
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roiitaiiiing  iui  overt  verb).  Given  the  iitniid  reqiiircniciit  of  botiiidiHl  accetw  to 
thiii  left  cuiitt'xt,  tlie  lioiiiid  roii^traiut  on  thcHe  coiiMtnietious  followed.  Since  the 
piUTiter  fiirra  the  same  probh^iu  in  both  types  of  gapjting  constmetious,  Fodor  ia 
right  ill  claiming  tliat  we  are  incorrectly  led  to  the  conclnsiou  that  the  presence 
or  absence  of  an  auxiliary  marker  in  the  gapped  constitneiit  slioiild  not  influence 
the  application  of  the  ronstiaiut.  Therefore,  in  countering  this  argnment  we 
must  show  that  conipb'iiient  attiu'luueut  (dPPs  dues  not  require  access  to  left 
context,  but  that  there  arc*  otlier  pro|terties  of  gapping  constnictions  that  require 
this  a«-cess  only  ui  cam's  where  no  overt  auxiliary  precedes  the  gapping  site. 
Li't's  start  with  the  si-cond  point  flrst.  Consider  the  following  examples. 

22.  [s  I  consider  [s  Bill  [vp  to  be  a  fool]]] 

23.  [s  I  consider  [s  Bill  (np  a  fool]]] 

Ill  (22)  the  embedded  clause  is  an  inflnitival  with  a  VP  predicate  and  in  (23) 
it  is  a  small  danse  with  an  NP  predicate. The  head  of  the  VP  predicate  in 
(22)  can  be  gapped,  as  shown  in  (24). 

24.  [s  John  bdieves  [s  FRED  is  a  FOOL]  and  [s'  HENRY  [vp  [v0]  AN 

IDIOT]]]*‘ 

Fodor  (1975)  has  shown  that  (24)  actually  involves  two  different  deletion 
rules.  Main  Verb  Deletion  eliminates  the  verbal  be  form  and  Tense  Deletion 
removra  the  associated  tense.  Cast  in  parsing  terms,  the  interpretation  of  the 
second  conjunct  involves  cxpaniling  the  jiarsc  tree  with  both  an  empty  tense 
morp)ir-me  and  an  empty  verb.  Note  however  that  the  surface  string  ui  the  sec¬ 
ond  conjunct  Is  locally  ambiguous  and  could  be  expanded  as  a  gnppcil  structure 
or  as  a  siiiidl  clause.  If  wi-  chose  the  small  clause  oltcnintive,  the  scmtencc  would 
be  nd<‘d  out  because  believe  docs  not  take  small  clause  complements,  os  shown 
by  (25). 

'25.  [s  I  believe  [s  John  [^•p  a  fool]] 

The  only  way  that  we  can  determine  the  proper  expansion  of  the  second 
conjunct  in  a  case  like  (24)  is  by  rescanning  the  left  conjunct.  Again  we  have 
a  case  where  a  detcriiiiiiistic  tree  expansion  involves  left  context  examination. 

^"Tbc  striictiire  of  miull  rlaitsci  ip  the  mibjert  of  wane  controvcniy.  CHioiiuiky  (1081)  following 
Stowcll  (1061)  nrgiicp  tli.it  riiihrihlril  cstcgoricp  like  DiO  a  fiwl  fomirtl  i<riiU>ut.iiJ  comple- 
mriitp  (ill  this  ense  with  the  ptrncturr  |ai>  |ni‘  Jnhn|  a  fun)]).  Williniiw  (1083)  iirgiic*  Ask 
tlirar  catrgoricp  do  not  fiiriii  <i  coiiPtitucut  luid  that  they  arc  pnijierly  nniilyicd  as  [. . . 

Johiil  Imp  n  fiiolj. . .].  IlnriiHtriu  niid  lii|{btf<H>t  (forthronuiig)  argiir  nHiiiiist  Williiuns’s  anal- 
ysip  iuid  in  favor  of  a  iiiudilicd  vereiou  of  the  (lioiiisky  Stowcll  ii|>pn>u<'h.  Tlic  only  i>oint 
rcirv.'uit  to  this  Argniiirnt,  however,  ia  that  the  prrdicatep  of  phiuII  clanacs  are  not  VPs. 

*'We  follow  Fodor 'a  canveution  of  indicating  die  plncciucnt  of  heavy  stress  on  a  word  by 
eapitalisatioa. 


Givrn  our  iu»ual  logic,  wc  lauHt  ensure  that  we  will  never  have  to  look  at  an 
inibounded  stretch  of  left  context.  Therefore,  w«“  ]>redict  that  cjwes  involving 
tense  d<-leti<in  should  obey  bounding  exactly  wliat  Fodor  demonstrates.  As 
additiomd  evidence,  consider  (2Gjk).  If  the  parsing  version  of  tense  deletion  is 
governed  by  botiiiding,  then  we  ]»redict  that  thi*  small  clause  analysis  will  be 
the  only  permissible  expiuision  of  the  ond>e<1ded  clau.se  in  the  st'coud  coiyunct. 
Since  believe  dot'sn't  take  small  clauses  we  predict  the  unaiTeptaiuiity  of  the 
structure,  in  coiitriist  with  the  acceptable  (2Gb). 

*2Ca.  1  think  Fred  is  a  fool  and  Sue  b<*licves  Joint  stupid. 


2Cb.  I  think  Fred  is  a  fool  iuid  Sue  believes  John  is  stupid. 

In  contrast,  ctist's  that  involvt^  only  iiiain  vi'rb  deletion  will  lu^ver  create  the 
siiine  kiinl  of  ambiguous  situations.  This  is  because  the  ]>resence  of  an  overt 
auxiliary  untunbiguously  .signals  that  a  verb  phriLsc  nntst  follow.  One  never 
finds  overt  atixiliarics  in  small  claust's.  Since  the  {uirscr  will  always  be  right  if 
it  t'xpands  the  phrase  after  an  overt  tiuxiliary  as  iui  empty  headeil  VP,  it  will 
never  have  to  scan  the  left  conjunct.  In  a  ras«*  like  (27)  it  simply  uses  the  locally 
aviulablc  overt  auxiliary  to  decidr*  about  subscqumit  expansion  of  the  tree. 

22'  John  has  fished  in  the  ocean  and  Dill  has  in  the  sea. 

Since  wc  never  need  to  examine  left  context  when  the  auxiliary  remains 
in  the  surface  string,  we  do  not  expect  Main  Verb  Deletion  to  obey  bounding 
constraints.  This  is  in  fact  what  Fodor  observes. 

This  accoimt  has  luiother  virtue.  The  information  providcul  by  the  left  con- 
t<-xt  to  resiilve  tin*  ambiguous  case's  will  be  available  at  the  time  the  parser  is 
confronted  with  tin-  ambiguous  material  of  the  s<!cond  conjunct.  This  contrasts 
with  our  jirevious  analysis  where,  as  Feidor  correctly  notes,  proi>er  identification 
of  a  ve-rb  s  subcategorixatiou  and  seh'ctional  properties  dennuids  access  to  the 
a<  i!!.!!  verb  of  the  jircvious  conjunct.  Unfortunately,  our  parser  will  have  al- 
ri  ady  shunted  this  material  into  the  PNS  r<'prcs«'nfation.  Our  parser  slnmts  at 
the  end  of  c-cominaiid  doumins  leaving  oJily  immediate  daughters  of  the  com* 
pU'ted  constituent  available  as  infonnation  for  future  i>arsing  decisions.  This 
is  no  problem  for  our  new  analysis  lu^cause  wc  distinguisli  small  clauses  from 
gapped  constituents  merely  by  looking  at  previous  conjuncts  fm  the  presence  of 
a  tensed  auxiliary.  If  wc  treat  sentences  as  maximal  projr'ctions  of  INFLection 
(Uhomsky  1981)  mnl  if  wc  a!«.sunie  that  lexical  information  about  the  head  of  a 
category  is  i)roj<'cted  from  that  hea«l  to  its  most  maximal  projection,  then  the 
relevant  inforiiiatioii  will  jicrcolate  up  to  the  highest  S  node  on  the  tree  and 
thus  be  available  to  the  parse  for  expansion  dccisicjiis.** 

'^^Projrctioii  to  the  most  maxiiiiul  proj«Tti«n  is  siipporUsi  liy  inovriiicnt  of  pnstvcrbal  Subjects 
in  Italian.  Since  tlnw  clcnirut.i  occur  in  structures  like  (a)  wc  must  insure  that  the  verb 
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Consider  again  a  structure  like  (24),  repeated  as  (28),  with  irrelevant  details 
omitted. 

By  the  time  the  parser  rejiches  the  locally  ambiguous  seroud  conjunct,  the 
first  conjunct  will  have  been  shunted  to  the  PNH.  Thus  uiformation  comained 
in  this  conjunct  will  not  be  aviiilable  for  decisions  about  tree  expansion.  This 
causes  no  trouble  because  we  see  that  the  ten.scd  character  of  the  first  conjunct 
can  be  read  off  the  highest  INFL  projr'ction  that  c-commands  and  is  boundedly 
for  from  the  INFL  (INFL')  of  the  next  conjunct.  If  the  first  conjunct  was  a  small 
clause,  then  the  0-inflectioii  would  also  percolate  up  to  the  maximal  S  node.  Tills 
is  all  the  uiformation  the  parsiT  meds  to  correctly  exiuuid  the  tree  of  the  second 
conjunct.  If  the  jirevions  conjunct  contains  a  tensed  or  infinitival  inflection,  the 

ran  troiisiiiit  its  fratiirrs  to  tlic  iiiaxinial  VP  in  nrdiT  for  Uic  tjacr  of  the  postverbal  Subject 
to  satisfy  the  conditions  on  proper  govcmnieiit  inijHised  by  the  ECP. 

(•) 

VP 

/  \ 

VP  NP 


paracr  rxpauiLs  tlie  roiijiiiict  as  a  Kappr<l  atincturr.  If  the  previons  cotyunct 
roiitaiiis  a  0  iiiflectiou,  tlu*]!  the  jMirst'r  cxpaiiils  the  niiibit;u(>u8  structure  as  a 
siiiiiU  rhuis<\  This  analysis  makes  the  interesting  prcMlirtion  that  if  Ss  instead 
of  S's  are  ronjoined.  tense  deh'tiun  should  he  uuaceeptalde.  Since  §  is  not  a 
projection  of  INFL,  eoiijiiuetiou  <if  iis  woidd  not  allow  percolation  of  information 
beyond  the  first  ronjunet  in  a  stnirture  like  (28).^’  Since  expiuision  ns  a  tensed 
structure  is  conditioned  by  the  prmnice  of  an  overt  auxiliary  in  the  prirvioiu 
conjunct,  the  jiarser  will  not  be  able  to  apply  the  tense  deletion  rule.  This  is 
confirmed  by  coniiMiring  (29a)  and  (20b),  where  we  have  coiyoined  S's,  with 
(29r)  and  (29d),  when'  we  have  coitjoincd  Sa. 

29a.  That  Fnuik  would  liit  Sam  and  Bill  woidd  liit  Harry  surprised  me. 

29b.  That  [$  Dill  would  hit  Sam]  and  {s  Frank  [infl'  (^)  (vp  (v0  ]Harry] 

surprised  me]] 

29c.  That  Frank  would  hit  Sam  «md  that  Bill  would  hit  Harry  surprised 

me. 

*29d.  [g  [g  That  [s  Frank  would  hit  Sam]  and  that  [s  Bill  [inpl'^Kv®] 

Harry]]  surprised  me.]] 

As  preilicted.  Main  verb  deletion  can  a|>ply  in  both  conjoined  S  and  Ss  aa 
shown  in  (30). 

30a.  That  Frank  would  hit  Sam  and  Bill  would  Harry  surprised  me. 

30  b.  That  Frank  would  liit  Sam  and  that  Bill  would  Harry  surprised  me. 

Thus  this  apjiroiwh  correctly  distinguishes  the  two  cases  of  gapping. 

lleturning  to  our  first  probhun,  we  miLst  show  why  the  problem  of  coinplement- 
vs.  adjmict  attachiiKUit,  wliich  applii's  in  both  types  of  gapping,  docs  not  force 
lii^  I  .T.'W'r  to  look  at  left  context,  thus  incorrectly  predicting  that  bounding  con- 
str.'iini.'^  apply  to  lioth  kinds  of  gapping.  The  treatment  in  our  book  assumed 
that  the  semantic  interpretation  of  adjuncts  and  coinplemeiits  proceeded  in  es¬ 
sentially  the  .same  way,  by  reading  off  tree  structure.  If  we  assume  this,  then  it 
follows  that  a  deterministic  parser  must  attach  PPs  and  other  adjunct  phrases 
&s  they  are  attached  by  the  grammar,  in  order  to  carry  nut  semantic  interpre¬ 
tation.  However,  this  assumption  Ls  highly  dubious.  As  Miller  and  Choinksy 
(19C3),  Marcus  (198U),  iuul  nuuiy  others  note,  in  certain  cases,  strings  of  adjunct 
phra>«es  can  occur  in  j)ot«'ntially  unlimited  configruutions.  Thus  a  sequence  like 
the  man  in  the  howic  by  the  river  by  the  woods  near  the  town  con  have  any  of 
the  following  intepretations: 

’^Srr  Zubiinrrctta  (1082)  mid  StowcU  (1081). 
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[ill  the  huii»e][by  the  river  (liy  the  wooil8]]|iicar  the  town]. 

[ill  the  hoiine  [by  the  rivei][by  the  womia  [near  the  town]]] 

[ill  the  house  [by  tiie  river  [liy  the  wooils  [near  the  town]]]] 

A  parser  tlint  hail  to  do  senuuitir  iiitepretation  from  tree  structure  would 
find  it8(‘lf  in  on  exponentiid  repress  hi  such  roses,  bi  order  to  figure  out  which 
interpri'tation  to  i;iv<‘  the  .st'iiteiire,  it  would  have  to  compute  the  correct  syntac¬ 
tic  structure,  but  in  ord<-r  to  do  this  it  ha.s  to  compute  oil  the  possible  patterns 
compatible  with  this  strint;.  luid  then  see  which  one  it  ‘^meiuis  to  say.”  Tliis  will 
cause  an  exponentiid  slowdown  in  the  parsint;  algorithm,  if  all  trees  must  be 
explicitly  reconstructed.  One  cliLssic  solution  ]>ri>posed  by  these  authors  is  that 
ad; I. net  phraa<;s  that  can  l)e  mnbitpioiis  (either  between  adjunct  anil  complement 
readini;8  or  between  various  adjunct  reudiiiKs)  .should  be  parsinl  essentially  as 
Hat  structuTi's.  Scimuitic  subroutiiu's  can  then  come  hi  later  and  decide  between 
the  possible  readings:  a  procedure  that  allows  us  to  maintain  elficient  parsing. 

Put  hi  the  context  of  the  ga|>ping  eonstructioiis,  if  a  parser  cannot  figure  out 
where  an  adjunct  Ls  attached  from  the  lond  context,  it  ran  simply  attach  it  os  a 
flat  structure  to  tlie  lowi'st  node  in  the  ]>arsc  trir'.  Then,  independently  needed 
si'imuitic  routini's  will  give  this  phrasi!  its  appropriate  seinmitic  interpretation. 
Thus  the  attachment  of  adjunct  PPs  in  neither  type  of  gapping  can  force  the 
parser  to  scan  left  context.  Therefore,  the  attachment  of  adjunct  phrases  docs 
not  incorrectly  predict  boundhig  effects  hi  Main  Verb  Deletion. 

3  Objections  to  basic  assumptions:  transparency 
and  determinism 

3.1  What  is  nondeterminism? 

'.V'  11  first  analync  the  distinction  between  ileterininism  and  nondetenninisin, 
and  liow  Fodor  views  that  distinction.  Fodor  makes  two  points: 

1.  A  nnndetcrministic  ]iarser,  just  like  a  deterministic  one,  could  benefit  from 
locality  restrictions  -if  the  cost  of  liackup  is  high. 

2.  A  deterministic  parser  cannot  recover  from  error,  and  so  cannot  comport 
with  what  is  known  about  human  proces.sing  of  sentences. 

Nondeterministic  parsers  do  not  reflect  processing  complexity 

Let’s  take  these  points  in  turn.  First,  as  we  said  earlier,  one  must  distinguish 
betww'n  two  viTsioiis  of  the  iiondeterininisiii  hypothesis:  true  iiondeteriniiiism, 
where  all  possibilities  are  explored  in  paralli'h  and  timulated  nondeterminism, 
where  one  possible  parse  is  ex]ilorcd  at  a  time,  and  backup  occurs  if  one  line 


of  attack  fails.  Only  the  fir.st  version  iiiiikos  the  iiniideterministir/detemiiiiistic 
parsing  distinction  ch>arrut,  and  this  is  the  one  we  chose  for  comparison.  The 
second  version  of  ticndeteriiiiuisni  is  jii.st  like  the  Marcus  inodt'l  in  that  a  single, 
])artirular  siMpieiice  of  i>arsing  decisions  is  made  as  we  move  through  the  sen¬ 
tence,  left-to-right.  It  is  unlike  a  deterniinistic  model  in  that  revisions  in  that 
sequi'iicc  of  decision  are  itssiimed  to  occur  all  the  time. 

Fodor  <lo(9  not  make  the  clearcut  choice.  Instead,  alie  opts  for  a  determin¬ 
istic,  on(’-]>ath-at-a-tiuie  simulation  of  true  uondcterminisni.  This  position  is 
quite  weak,  because,  as  Fodor  uoU-s,  one  cmi  turn  this  simulation  uito  the  func¬ 
tional  e«juivuleut  of  a  deterministic  parse  simply  by  making  the  cost  of  revising 
derisions  very  high: 

Every  point  that  M.  mokes  could  have  been  made  just  as  well  within 
the  context  of  a  nondcterministic  parser  which  cared  about  efficiency. 
(Fodor,  page  18) 

Imposing  a  cost  metric  on  backup,  then,  gives  tis  more  flexibility.  But  is 
this  too  much  flexibility?  There  are  three  i>asic  options.  If  we  say  that  backup 
costs  are  sero,  then  we  have  in  effect  the  case  of  true  nondeterminism;  if  we  say 
that  backup  costs  arc  inflnitc,  we  have  a  Marcus  model.  If  we  make  the  costs 
somewhere  in  between  r.ero  imd  inflnite,  we  get  a  middle  view. 

Fodor  takes  this  as  a  virtue:  all  bases  arc  covered.  But  is  this  so?  Do  we 
need  at  least  this  three-way  split?  If  one  is  going  to  impose  a  constraint  on  a 
weaker  system  that  luis  the  functional  effect  of  determinism,  it  would  .seem  just 
as  .sensible  to  start  with  that  constriunt  in  the  first  place:  nsstimc  the  machine 
Is  deterniinistic,  iutd  s<-e  if  the  required  i)sychohnguistic  complexity  options  can 
be  obtained  this  way.  (Cutting  up  the  constraints  this  way  makes  a  diffiTcncc.  A 
“cost"  metric  is  the  weaker  position,  because  we  must  justify  the  metric  we  use 
s<mieiiow.  That  Is,  we  must  support  both  the  assumption  of  nondctenninisni 
and  a  parlicidar  cost  metric.  In  contrast,  a  deterministic  machine  is  directly 
built  to  act  as  if  backtracking  costs  sire  very  high.  There  is  no  separate  cost 
metric  device  in  the  Marcus  parser;  therefore  we  need  not  justify  one.  All  we 
need  to  justify  is  the  assumption  of  determinism,  which  we  must  do  in  any  case. 

There  coidd  be  other  grounds  for  the  ilexibiUty  allowed  by  a  cost-metric 
addition  to  the  nondetermiiiistic  model.  In  a  footnote  to  Ikt  paper,  Fodor  tries 
to  turn  the  cost-metric  model  to  her  advmitage,  ns  a  way  to  simulate  observed 
human  sentence  processing.  Fodor  attempts  to  equate  backtracking  cost  with 
Iiroccssing  difflcidty: 

But  it  could  very  well  be  that  that  the  really  severe  garden  path  sen¬ 
tences  ...  are  those  for  which  all  the  Mnrong(=corri'rt)  initial  chokes 
arc  reconsidered  before  the  one  that  was  truly  at  fault.  This  is 
where  the  2"  tigun*  would  approach  a  realistic  estimate  of  parsing 
time,  and  it  would  nicely  account  for  the  inordinate  difficulty  of  these 
sentences _ Thus  the  striking  differences  that  have  bemi  observed 
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in  the  procoaaiug  diffieulty  «f  iiuttiral  language  sentences  are  per¬ 
fectly  consistent  witli  the  inatheinatical  n^sults  for  noinletcrininistic 
parsing  with  online  biickup. 

Fodor  is  rltuniing  that  a  garden  path  sentence  such  as  the  home  raced  poet  the 
bam  fell  demands  expoiu  ntiid  ]mrsing  time  iMTanse  of  backup,  while  relatively 
easitT  ‘^longardi'ii  iiath”  s<'iitences  (such  as  they  told  the  students  that  John  liked 
that  Dill  wotdtl  leave)  do  not.  But  it  is  easy  t<i  see  that  both  of  these  require 
the  .siiine  amonnt  of  backtracking.  The  problem  is  that  in  a  direct  backtracking 
un]ilementation,  backup  orcum  all  the  time,  even  on  simple  sentences.  For  the 
first  sentence,  a  ha<'ktra<'king  {mrser  niiLst  make  a  dis-ision  just  before  raced, 
IwtwjsMi  a  relative  «  tan.se  iuid  a  VP.  Assuming  fnspiency  preference,  it  takes 
the  VP  reading,  which  fails  when  felt  is  enconntere<l.  Now  it  must  biurkup. 
Wi  'll  iissiime  the  liu>t  previous  choice  point  was  before  that  John.  In  fact, 
till-'  is  not  corns  t.  In  a  pure  backtracking  parser,  we  would  h.ave  to  imwmd 
to  idl  intermediate  choice  points:  there  might  be  a  relative  clause  after  6flm; 
there  might  be  an  NP  object  after  raced]  and  .so  on.  Finally,  wc  iirrive  at  the 
choice  at  raced  mid  can  contumc.  If  the  inacliine  cmi  inspect  the  current  word 
it  is  scanning,  two  or  three  choice  points  are  involved.**  More  liacktracking 
correlates  with  processing  difficulty.  Even  so,  such  a  sentence  would  not  be 
impossibly  difficult  for  a  backtracking  parser.  (And  remember  that  it  would 
be  perfectly  easy  for  a  true  nondelerministic  parser.)  In  fact,  the  backtrackmg 
parsi'T  docs  not  do  exponential  work  on  such  an  example. 

What  of  the  second  sentence?  Fodor  must  claim  that  such  a  ca.se  causes 
little  or  no  backtracking,  relative  to  garden  path  smilences.  But  here  too,  a 
backtracking  parser  must  do  a  lot  of  work:  before  that  John  liked  wc  call  for  an 
embedded  Sentence  instead  of  a  relative;  similarly  before  that  Dill.  When  we 
get  to  would  we  must  backup.  First,  wc  unwind  to  that  Dill  and  try  a  relative 
clause  reading  for  it.  Tliis  fails.  Then  we  Imckup  to  the  nrat  previous  choice 
point,  and  try  alternative  categorisations  for  like.  Finally,  we  arrive  at  the 
choice  between  a  ndative  and  an  embedfled  S  just  before  that  John  liked.  ** 
Il"'.ii’Jily  the  same  backup  taki's  place  here  as  with  the  “real”  garden  path. 

Of  course,  there  might  be  some  otlu'r  parsing  scheme  to  get  us  out  of  this 
]>arlirular  dilemma.  The  problem  is  that  miy  genend  scheme  to  make  back¬ 
tracking  easy  will  idinost  necessarily  make  the  garden  path  sentences  easy  as 


“jjurc"  ATN  d<w»  not  cvru  look  nt  the  current  word  it  is  sriuining  in  order  to  make 
a  guess  ahinit  wluit  to  do  next.  Diit  this  nie;uis  that  even  very  simple  sentences  such  as 
Dr.  carrfvl  involve  extensive  b.-icktriuking,  heeaiise  the  iimcliiiie  guesses  Uint  it  will  see  a 
decliirntive  sentence,  then  u  <iucation,  and  so  forth.  Tliis  nlteruntive  would  simply  make 
oiir  poiut  even  more  strongly,  so  wc  won't  adopt  it. 

Using  standard  ATN  terlini<|nes,  preference  for  one  type  of  phrase  type  rather  than  nnotlier 
can  he  encoded  hy  ordering  the  ares  flmt  h-.ive  .t  network  st.ite.  One  e.ui  order  the  arc 
alternatives  so  as  to  take  u  rel.ative  clause  push  after  that,  hut  then  this  will  he  wrong  and 
fail  to  account  for  the  itreferrerl  einhr<ldc<l-S  reading  of  they  UM  the  students  that  JiUm  Ubed 
the  story. 


well.  At  heart,  n  hncktrarkiiit;  iMuser  hacklnuks,  auci  it  is  quite  clifHcult  lc>  tisc 
ad  hue  eost  uieixsiires  tu  make  it  perforin  otherwise. 

Deterministic  parsers  can  recover  from  garden  paths 

Li't’s  now  turn  to  llie  second  point,  aliont  deterministir  i>arsing  mid  error  recov¬ 
ery.  Wliile  Fodor  wmits  tlie  ihrxihitity  tu  simulate  determinism  when  needed  in 
her  own  mod<*l,  slie  denies  ii<‘xibiiity  for  a  deterministic  jnirser  to  recover  from 
garden  patlis: 

The  only  dilference  Ix-tweim  a  deterministir  parser  and  a  noiidc- 
termiiiistic  ]>arser  is  (hat  in  the  former  a  giirdc'ti  path  analysis  is 
permanent  mnl  itiirepairuble,  while  in  the  latter  giirden  jiaths  can 
occur  mid  be  recovi-red  from  during  the  parse.  (Fodor,  imge  18) 

Out  agiiin,  as  Fodor  acknowledges  in  her  footnote  20,  this  is  not  to  deny 
that  there  coidd  be  spieiali/.ed  determini»tie  recovery  procedures  for  garden 
piitli  sentences,  as  suggested  by  Marcus  (1080).  For  these  procedures  to  apply, 
we  would  of  course  toe  the  line  of  deteniiinisin:  backup  along  the  lines  suggested 
by  Fodor  (or  in  an  ATN)  would  not  be  permitted.  Iileally,  following  Marcus’s 
definition,  the  rveavery  procedure  slundd  only  be  allowed  to  add  information 
about  the  parse,  not  wipe  out  what  has  already  been  lemned.  Instead,  when  the 
parser  blocks  (luxause  no  known  rule  applies),  a  recovery  procedure  coidd  look 
globally  at  the  state  configuration  of  the  parser.  Then,  by  slightly  rearranging 
cxi.sting  subtretjs  of  the  parse,  the  recovery  procedure  sliould  simply  add  new 
information  about  the  sentence  analysis  mid  come  up  with  the  correct  sentence 
structure. 

Interestingly  enough,  the  Marcus  ifesign,  slightly  modified,  provides  the  in¬ 
gredients  of  just  such  a  theory  of  garden  path  sentence  recovery.  We  can  only 
sketi-h  the  basic  ulea  here. 

liCt  us  considi’r  again  the  horse  raeed  past  the  harn  fell.  When  a  Marcus-type 
pars'T  fails  on  such  a  sentence,  it  is  reaiUiig  fell.  But  there  is  much  information 
in  iis  machine  configuration —  its  pushdown  stock  mid  input  buffer'-of  value 
for  error  recovery.  It  is  possible  to  design  a  natural  recovery  procedure  that 
uses  this  information  deterministically  to  build  the  correct  output,  though  at 
some  cost.  For  example,  in  the  horse  raeed  example,  one  need  only  msert 
a  new  S  boumiary  between  horse  and  raced.  There  is  also  room  within  an 
evaluation  metric  of  recovery  to  differentiate  between  diflicult  ganlen  paths  and 
easy-to-analyse  sentences  with  intcrprctatimis.  Barton  mid  Berwick  (1985)  give 
some  of  the  details.  Contrary  to  what  Fodor  asserts,  recovery  is  possible  in  a 
deterministic  macliinc. 

3.2  A  two-stage  design? 

Fodor  also  takes  issue  with  our  division  of  parsing  labor  uito  separate  tree- 
building  and  imh-xing  stages.  Again,  slic  makes  two  basic  points;  first,  that  this 
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division  is  not  motivated  on  {;ronnd8  of  ronipntational  effirienry;  mid  second, 
that  this  divisif>n  is  not  motivated  by  the  giiuiimar  (so  that  we  arc  violating  our 
own  assumption  of  trmusparency  connecting  grammar  and  panu'r).  Again,  we 
disagree. 

(lunsider  computational  cfKciency.  Fodor  first  claims  that  computational 
reasons  alone  can’t  motivate  the  bounded-context  character  of  our  parser: 

Given  that  the  I'fficiency  results  for  bounded  conli-xt-parsing  are  no 
better  thmi  for  Lll.(k)  parsing  in  general,  the  crucial  assumption  that 
the  first  stage  of  IliiW’s  parser  Is  a  bonudi'd  context  device  receives 
no  support  from  these  efficiency  results.  (Fodor,  page  41). 

But  ius  Fodor  lu'rself  noti's,  computational  complexity  cidcnlations  arc  often 
relative  to  repri'siuitational  issues.  If  one  piclusi  some  other  representational 
formal,  tfien  certain  coiii]nitational  issues  can  become  irrelevmit.  For  example, 
if  we  ;ulo|>t  tru('  nondeterminism,  then  it  is  not  difficnlt  to  parse  any  st'iitcnce  of 
a  context-free  grammar,  no  matter  how  ambiguous,  in  tune  proportional  to  the 
sipiare  of  the  grammar  si/.e  and  the  cube  of  sentence  lengtli  (where  the  grammar 
is  measured  in  terms  of  the  total  number  of  grammatical  symbols ,  hkc  NP  and 
VP,  not  just  rules.  Soc  Earley  (1068)). 

This  being  so,  one  cannot  divorce  a  discussion  about  computational  effi¬ 
ciency  from  representational  format.  We  liave  chosen  to  represent  the  parser’s 
knowledge  transparently,  that  is,  to  include  only  those  categories  sanctioned 
by  the  grammar.  The  categorir-s  of  our  grammar  include  only  the  basic  lexical 
priijiTtions  NP.  VP,  PP.  and  .so  on.**  By  saying  that  our  parser  works  triuispar- 
ently,  wc  mean  that  the  jiarser’s  rules  can  only  make  reference  tci  these  Uteral 
symbols.  To  put  the  .same  point  another  way,  transparency  requires  that  the 
only  slates  the  parser  has  .arc  the  “states”  -  i.c.,  tlie  nonterminal  names— that 
the  gr.'iinmar  has.  The  (lar.ser  cannot  use  any  derived  farts  about  the  grammar; 
nor  ran  it  apju'al  to  nnnti'rminal  symtiols  that  do  not  otherwise  exist.  For  ex- 
aiiqile,  tli<'  parser  cannot  create  a  new  state  in  order  to  “renu  iiiber”  that  a  tuh 
jihrase  has  been  encountered  carber  in  the  stmtencc.  This  would  correspond  to 
a  roiiqilex  nonterminal  name  such  as  WH/NP. 

In  geiKTid,  LR(k)  jiarscrs  me  allowed  to  create  such  states  whenever  they 
are  ner-ded.  These  states  (in  the  form  of  a  finite-state  control  table)  encode  the 
set  of  possible  left-most  ih-rivation  patterns  for  the  given  grammar.  Since  they 
represent  derivation  regularitir's.  these  states  nei^  not  map  in  a  1- 1  fashion  to 
the  nonterminal  n.'uiies  of  the  grammar,  .and  in  fact  the  wh  sentence  example 
shows  that  in  some  gr.imiiiars  the  nonterminals  do  not  match  the  states  of  the 

**Likc  most  synlHclk  tUcorirs  sim-c  .ttpneb  nf  (he  theory  o/  Syntax,  wc  nlso  inclndc  triulitioual 
n(>rrcTiii-ii(  fratiirca  like  IVrsoii,  Nuiiitirr,  laid  (Jriulrr,  as  properties  of  lexie.il  proji-ctiont. 
We  explicitly  dii  not  iiirl!i<le  the  “sl.'i.sli"  feature  of  (leiieralixeil  Plir.w  Stnictiire  (IriOimnar 
(resiiltiiif'  ill  riHiiplex  cateRories  like  VP/NP),  since  this  feature  is  not  li'xicolly  |irojcctcil 
(X"  or  lexical  items  are  spccitically  barred  from  liaviiiR  ‘slosh'  features  in  GPSU). 


parsing  macliinc.*^  Howovor,  wc  have  spwifirally  Iwrrrd  the  use  «f  parsing 
states  that  do  not  corresjiond  to  lexically  projerti'd  non'erniinal  names.  There¬ 
fore*.  onr  approach  does  not  iuiinit  the  entire  class  of  LR(k)  parsers.  Instead, 
our  parsing  ride's  can  make'  re'feremce  only  to  granimatiral  symbols.  There  is  a 
rhuiS  of  deterministic  parsc'rs  that  eiefines  such  a  class  of  luachines,  namely,  the 
bounded-context  fiamem This  is  the  parsing  eli'sign  we  have  adopted. 

Fodor  is  corre'ct  that  gi'iieral  coiiipiitatioual  grounds  elo  neit  force  the  bounded- 
context  rheiie-e  ein  us-  but  tluvt  Ls  triviidly  so.  For  I'xaiiiple,  if  we  adopted  a 
meire  powe'rfnl  elevie-e',  such  as  a  iioiideti>rmiuistic  ilevice,  we  weiiild  not  need 
this  structure.  Dut,  edi  other  things  being  ('ejual,  it  is  the  stronge'r  assumption. 
Triinspare'iicy  is  stremge'r,  be'Ciiuse  we  nee'el  not  posit  any  entitie's  beyond  those 
the'  graiiiniar  eilre-aely  give-s  us;  aiiel  idl  other  things  are  i^iual,  because  in  this 
case  “all  oth(!r  things”  is  simply  parsing  elBciency  iuid  an  ai-count  of  the  psycho¬ 
logical  fiu-ts  about  parsing  unbounded  dependcmcies.*®  It  is  of  course  true  that 
a  [larser  need  not  re.spect  the  repn»entations  providi'il  by  the  grammar.  But  it 
is  simpler  to  assume  that  it  del's.  A  grammar  that  contains  just  projections  of 
lexical  items  is  smaller,  sini]der,  and  hence  easier  to  learn  than  ona  that  does 
not.  Tlu're’s  a  si'iise  in  which  such  a  parsi'r  is  completely  lexically  based — there 
are  just  projections  of  lexical  items,  and  nothing  more. 

Fodor  also  argues  that  trarmiiarency  itself  docs  not  motivate  a  literal  bounded- 
context  parsi'r,  bt'cause  the  griuumar  contains  ntles  that  mention  variables: “as 
long  Jis  the  triuisformational  niles  of  the  competence  grammar  can  contain  vari- 
ahh's  (explicit  or  implicit)  we  would  expect  parsing  rules  employing  the  same 
iiu'talinguistic  vocabulary  to  do  the  sjinie.”  She  concludes  that  we  need  “an  cx- 
()licit  (irohibition  against  variables  in  the  parsing  rules.”  (Fo«lor,  page  47).  Dut 
agiiiii.  there  me  two  parts  to  any  coiiipntationnl  operation;  the  procedure  itself, 
and  the  data  st  ructure  or  reprt'sentation  it  works  on.  In  this  ciisc,  there  are 
no  variables  b«'causc  there  arc  no  complex  category  symbols,  and  because  the 
rub's  of  the  iniudiine  arc  finite.  As  Fodor  notes,  these  are  indeed  “stipulations” 
(page  48)  one  must  always  assume  something  in  arguments  about  computa¬ 
tional  matters,  since  we  don’t  have  the  luxury  of  neurophysiological  findings. 

’^Tliis  trausparciiry  dutinctio.i  also  sliows  up  in  Uic  way  Uint  LR(k)  parM^ra  are  built.  Hie 
iisiiiii  apfiroocli  in  t<i  pmeew  an  Ln(k)  gr.-uniitar  to  derive  a  finite-state  roiitrol  table  that  is 
.irtiiiilly  used  for  parsing.  The  atates  of  this  table  ncetl  not,  and  usually  do  not,  correspood 
in  <uiy  tr-mspareiit  way  to  iudividtiiil  iiontcnninal  names.  Instead,  in  eifect  tlicy  stand  for 
tlieorrriu  about  derivations  in  a  particiiiiir  graininar.  By  banning  such  nontransparency,  we 
arc  banning  such  preprocessing. 

^"Sec  Floytl  (lOC-i).  Actually,  wc  must  define  an  extension  of  the  Itoundcfl-contcxt  parsers  that 
ii.ves  uonterriiiiial  lookaliearl  os  the  M.irciis  riuicliiuc  iloes.  For  details,  see  Berwick  (1085). 
We  could  idso  Viiry  other  details  of  the  boiiiiflcd-conlext  design,  as  long  as  wc  retain  the 
key  feature;  ))arsing  rules  must  refer  only  to  graimnnt  icnl  symbols,  not  to  parsing  states. 

**Ta  make  the  same  point  in  reverse,  the  only  eviilencc  for  the  more  powerful  machinery  of 
a  hold  cell  or  “slasliiir  categorb's  seems  to  be  the  ability  to  parse  unbounded  dependen¬ 
cies.  Out  if  this  can  be  exphiiiied  without  resort  to  such  machiuery,  then  this  leaves  its 
jiistiheation  iiiiestablished. 


Siinlliirly.  Fodor  “stiinilHtc's”  tliat  a  I'raiiiiiiiir  allows  iiiiu-liiniTy  bcyoiul  basic  X 
categories,  and  tbat  the  parser  includes  backtracking  as  a  staiuhurd  feat\ire.  The 
qiu'stion  is  how  natiirid  tlu-se  stiptdatioiis  ar<'.  bi  fai-t,  in  (loveriiineiit-Binding 
theory,  the  ruh-  Mov<?  o  <lo<-s  not  have  variahh^s  ((Ihoinsky  1977,  1981  is  quite 
exj)licit  on  this  point).  Deh-tions,  on  tin*  <)fher  hand,  c<in  have  vjiriables,  but 
this  is  not  releviud  for  parsing  becausr'  deletions  are  locally  iinanibigiious  (see 
the  previous  section  on  (lapping  and  Berwick  juid  Weinberg  (1984)). 

B<'yoii<l  this  <|iicstion  of  bouinled-context  jnirsing,  Fodor  then  goes  on  to 
question  our  division  of  ()arsiiig  into  two  stages  at  all.  She  again  chiiins  that  we 
violate  our  own  critr'rion  of  transparency  junl  tlnit  such  a  division  is  not  needed 
on  grounds  of  efficiency. 

Tli<‘  efficiency  count i-riirgument,  at  li-jisi  in  one  form  that  Fodor  givers,  goes 
soiiK'thing  like  this.  Our  second  stage  procedure  that  computes  referential 
dependcncir'a  --  that  John  and  he  may  denote  the  same  per.sou  in  sentences  like 
this: 

John,'  believes  that  Fred  thinks  that  Sue  said  that  he,-  is  smart. 

Since  tliis  procedure,  wJiatevcr  it  is,  must  be  able  to  search  unbounded 
domains,  why  not  just  let  it  d<»  the  job  of  searching  for  the  antecedent  of  a  wh 
phrase?  Altcrnaf  ively,  why  not  just  fold  the  two  stages  together,  combining  both 
jobs  into  one?  In  r'ffect,  Fo<l(»r  wants  to  “multiply  out”  the  two  representational 
levels  we  have  distingui.sln'd  into  a  single  one  f)ecanse  this  is  more  eJlicient.*® 

Since  Fo<lor  elsewhere  (Ctiiin  iunl  Fodor  1984)  ha.s  herself  argued  for  the 
comi)utational  beuefitK  of  nomnodidar  rr'presentations,  it  is  worthwhile  to  sec 
just  what  is  at  .stake  here.  Fodor’s  support  for  nonmodularity  is  surprising. 
First  of  all,  from  the  stand|)oint  of  coiiipulcr  science  generjilly,  it  cuts  against 
the  grjiin  of  iill  that  is  known  about  the  eflicient  solution  of  comj)lex  ]>roblems. 
(See,  e.g.,  standard  works  on  algoritluns,  such  as  Kuuth.  1973;  Aho,  Hoperoft 
and  IJllman,  1974.)  Second,  the  key  point  is  that  for  modularity  to  work  the 
distinct  levels  should  have  different  rcprcsmitational  j)roprTties,  because  each 
is  rlesigned  to  highlight  different  aspr'ct.s  of  the  sjune  problem.  This  is  the 
source  of  the  power  behind  the  idea  of  two  levels  of  rcpii^entation,  words  and 
phrases.  It  is  eiisier  to  stale  the  facts  about  agreement  if  we  use  Norm  Phrases 
iuid  Verb  Phrases  rather  tluui  simple  words,  bi'causc  then  we  have  just  two 
sinqde  nqrresentational  units  adjiicoiit  to  one  another  (NP  next  to  VP),  bi  fact, 
a  simple  finite-state  automaton  suffices,  given  that  the  phrases  are  constructed 
first.  Similarly,  there  are  fa«’ts  about  language  that  are  more  easily  stated  in 
terms  of  a  linear  firrangennuil  of  words  e.g.,  that  a  D«'termim;r  ]>rc-rede3  a  Head 
Noun,  and  may  agree  with  it.  Tliis  (oversimplified)  fiictored  rr-presentation 

time*.  Foder  tniRKruts  just  the  ojjpositc,  .-w  when  she  proposes  tfi.nt  tfie  first  and  second 
statoN  to  <livi<lr  r<iiiij>uiatioii;iI  labor  brtwrrii  Minn:  “the  firnt  staRc  device  might 

cull  on  the  }«croii(h9tagc  th  vicr  to  do  the  autcmhiit  ch<*ck  prior  t-o  tr«vcr  pofttnlniion.  This 
might  call  for  n  nliglitly  more  ciniiplicatcd  routine  to  p<u«a  control  luuk  und  forth  between 
the  two,  hut  the  labor  Ktved  could  very  well  (Fodor,  page  43) 


call  be  modeled  as  a  cascade  of  fiuite-state  tranadueert,  wlicrc  the  fint  level 
system,  that  of  words,  builds  a  phrasal  representation  luid  feeds  the  second 
level.  Is  it  possiiile  to  collapse  those  two  levels  uito  one?  Yes;  one  can  ‘^nultiply 
out”  all  combinations  of  words  and  eliminate  the  phrasal  level,  by  forming  the 
product  of  the  two  finite-state  machines  representing  each  lcv<!l  (sec  Derwick 
1982).  Howevej',  it  do<»  not  make  sense  to  collapse  these  two  levels  into  one. 
The  cuUa|)sed  rejiresentation  is  much  larger,  because!  all  pos.sible  combinations 
of  constraints,  previously  imlepeiid<!ntly  expressed  at  each  level,  are  now  written 
out  cx]>Iiritly.  The  resulting  system  is  iiinch  hu'gc!r.  In  general,  if  the  constraints 
on  one  level  can  be  <>xpressc<l  by  a  mitcliinc  of  sixe  n,  and  the  constraints  on 
a  si'cond  level  cun  be  expressed  liy  a  macltine  of  sise  m,  then  the  collapsed 
niiirhiiie  coidd  be  of  sise  nm.’*  In  fact,  this  is  one  traditional  argument  for 
a  nndtiple-levels  view  of  language,  ns  initially  exiiressed  in  Chomsky’s  Logieat 
Structure  of  Linguiatic  Theory.  There  are  two  conijmtational  advantages  to  the 
modular  view:  one,  just  mentioned,  is  that  the  resulting  system  is  easier  to 
learn,  if  we  equate  smaller  sise  with  easier  Icarnmg;  the  second  is  that  we  can 
design  computational  procedures  tailored  to  work  with  the  specific  formats  of 
each  level. 

This  is  exactly  what  we  aimed  for  in  our  two-stage  model.  Each  level  has  a 
different  representation  that  lugldights  different  aspects  of  the  computation  of 
linguistic  structure,  and  each  is  designed  to  ease  the  computation  of  properties 
relevant  to  that  level.  The  first  level  deals  with  questions  of  how  to.  build  a 
tr<!e,  and  uses  notions  like  dominate,  precede.  For  example,  in  the  sentence 
example  we  gave  just  above  we  expand  the  tree  in  exactly  the  same  way  no 
matter  whether  he  is  bound  to  Fred  or  whether  it  is  a  free  pronoun  bound  to  a 
discourse'  NP  that  incurred  much  earlier.  This  contrasts  with  cases  governed  by 
Subjaccncy.  The  ]>rescucc  or  absence  of  an  antecedent  tells  us  how  to  expand 
the  tree  we  are  building.  If  there  is  an  antecedent  in  the  structure  imd  a  verb 
that  selects  or  subcategorises  for  an  NP,  we  create  a  trace  slot  in  the  phrase 
structure;  otherwise,  we  do  not.  Tliis  is  a  decision  about  tree  structure. 

Roughly  speaking,  referential  dependencies  can  cut  across  sentences  and 
involve  all  the  objects  mentioned  in  a  discourse— plainly  outside  the  purview 
of  s<nitencc  tree  predicates.  Secondly,  referential  dependencies  arc  calculated 
on  a  different  representational  base  from  phrase  structure,  just  as  Subject- Verb 
agreement  is  calculated  at  the  level  of  phrases  rather  than  words. 

What  would  hapjien  if  we  tried  to  collapse  the  referential  dependency  caku- 
hitioii  together  with  trof'-building  Ls  exactly  what  would  happen  if  we  tried  to 
c<iir  >)utc  Subject- Verb  agreement  at  the  kwel  of  words.  As  we  show  in  our  book 
(D<'rwick  and  Weinberg  1984),  our  first  stage  procedure  works  in  linear  time, 
in  time  cn,  where  c  is  a  constant  depending  on  the  sise  of  the  mitpiit  phrasal 
structure  and  the  sise  of  the  grammar,  and  n  the  length  of  input  sentences. 

For  more  realistic  rrprcsontntional  foruMto,  c.g.,  cootext-frcc  graiiuuars,  the  savings  can  be 
rvirii  larger.  S^'C  Oerwirk  1082  for  details.  See  the  next  section  for  additional  coimitenU  on 
tliii)  prolilcm  and  grnirunar  sise. 


The  search  for  referciiti.-J  luitwcAcuta  would  now  have  to  look  at  a  represen¬ 
tation  defined  ov(^r  complex  tret'  shapes,  iiirlniliiii;  many  irrelevant  strnettues. 
We  note  in  our  book  that  in  the  worst  ctist*  this  would  inrrt'ase  analysis  time 
to  kn^,  where  n  is  the  length  of  the  input  sentence,  and  k  is  some  constant 
that  depends  on  the  sixe  of  the  phrase  dt'seription.  It  is  alreatly  apparent  that 
pronoun  refereiititil  dejx'iidency  can  extend  iuToss  sentences.  It  is  also  apparent 
that  this  coni]iutation  can  be  noidiuear:  coiisitler  the  laborious  calculation  that 
seems  to  occur  when  one  uses  a  pronoun  whost*  antecedent  lies  iiiiuiy  sentences 
behind  in  a  disc<inrse.  What  Fodor  w<ints  to  do  by  cojiibining  these  two  steps 
is  make  the  first  stage  jirocedure  nonlinear  as  well.  Dnt  as  she  herst'lf  notes 
(])age  G8:  ^iii  gencrid.  linear  time  parsing  is  surely  just  what  a  uiodc'I  of  the 
human  sentence  ]trocessing  inechanisni  should  aim  for”),  this  would  have  the 
unfortunate  effect  of  making  the  coustnictioii  of  tree  structure  for  single  sen- 
l«-nces  potcntiiilly  nonlinear.  We  want  to  avoid  this.  We  wotdd  like  to  recover 
the  right  tree  structure  in  linear  time,  cvtui  if  the  pronoun  iuiteccdcnts  are  not 
in  place.  Note  that  there  is  much  wc  can  interpret  about  a  si'utencc  if  we  have 
its  correct  phriise  structure,  even  if  wc  do  not  know  that  Ae  is  dependent  on  an 
earlier  NP.  Fodor’s  collapsed  scheme  in  effiK-t  forces  the  miicliinc  to  stop  and 
wait  for  the  right  iuitecedeiit  calculations  to  complete  before  plunging  on.** 

By  fiictoriiig  apart  the  stages  of  tree-construction  and  referential  dependency 
calculation,  we  gain  at  the  second  stage  as  well  b(H;ausc  the  sixe  of  the  structtucs 
the  search  procedure  works  over  can  be  made  smaller.  That  is,  instead  of 
running  our  procedttre  in  time  cn^,  where  c  is  large,  we  can  run  it  in  time 
kn*,  where  k  is  a  short  list  of  NPs.  As  we  noted  in  our  book,  this  is  a  difficidt 
argument  to  make  because  in  most  cases  sentences  arc  short.  But  let  us  see  what 
it  means  in  detail.  The  second-stage  representation  includes  shunted  predicates 
and  NPs.  It  is  a  simple  matter  to  take  tliis  jnopositional  representation  and 
build  a  finite-state  transducer  (standing  for  a  homoinurphisni)  that  projects  jiut 
the  NPs  from  this  second  list.  We  may  immtine  this  projccteil  bag  of  NPs  to 
be  the  discourse  NPs  for  this  sentence;  it  could  include,  perhaps,  the  NPs  for 
previous  sentences  -  but  just  NPs.  It  is  because  we  have  now  isolatc<l  these 
units  on  a  st^paratc  level  that  the  search  for  referential  dependents  is  easier.  No 
otiicr  units  stand  in  th<^  way  of  a  direct  search  through  the  NP  list.  In  most 
cas(s<,  there  will  be  only  a  few  NPs  to  look  at.  Note  that  tliis  method  only  works 
because  we  have  set  up  the  first  .stage  to  build  just  the  right  structured  list  so 
as  to  i>rovide  the  right  NPs  to  look  through.  Further,  in  those  cases  where 
the  list  is  large,  wc  cxjiect  to  find  nonlinear  jnocessing  difficulty  informally 
at  least,  pn-ciscly  what  seems  to  happen  when  there  are  imuiy  potential  NP 
antecedents.** 

**Onc  could  design  a  “pipelined"  whoinc  where  a  accuiid-stnKC  refereuti.-d  dependency  calcu¬ 
lation  Works  off  the  input  from  a  first-stage  device.  Out  this  is  just  our  two-stage  model  in 
another  guise. 

^^Tlint  is,  a  liiieur  tint  of  tfiis  kind,  if  long  cnotigh  <ind  if  it  included  iltscoiirsc  NPa,  might 
take  linear  time  to  search  for  any  single  NP.  Of  course,  there  arc  other  poitsibilitiea,  since 


To  snuiiuarizo.  we  nrf'uo  that  isolatiiii;  tho  rrfrrcutial  dcpradciicy  cnlciilatiou 
in  t  his  way  pinpoints  iui  importmit  fuiirtional  <listinrtion  brtwcH'ii  building  tree 
structure  luid  n'ferenti.'d  dcpeiiflcncy.  Tree  ronstruction  is  fast  (linear  time,  and, 
in  fact,  realtime  if  one  r'xaniiiii^  our  proeixliire  in  detail);  each  phrase  is  built  in 
a  bounded  iuuount  of  time;  coindexiiig  (or  referr'iitial  dependency  calculation) 
does  not  iiiterfen*  with  this,  for  it  can  be  uoiiliuear.  Fodor’s  proposed  one-stage 
model,  because  it  interweaves  these  functionally  distinct  processes,  slows  both 
down. 

3.3  Another  source  for  locality  principles? 

Finally,  Fodor  contends  that  locality  principles  coidd  be  motivated  in  a  GPSG- 
type  theory,  both  on  grounds  of  easy  luirsabiUty,  and  -another  point  that  we 
ourstdves  note  -  on  grounds  of  Icaniability: 

This  negative  result  docs  not  mean  that  subjaccncy  cotdd  not  be 
functionally  grounded  ui  a  GP80.  As  clmpter  3  observed,  there  are 
mmiy  possible  “functional”  coiistraiiits  that  could  have  pitted  a  role 
in  the  .slia]>ing  of  language.  Foremost  among  these,  at  least  tradi- 
tionidly,  is  learnability.  (Dcrwick  ami  Weinberg  1084:166) 

Fodor  makes  two  .s[>cri6c  proposab  along  these  lines,  one  for  parsability,  and 
one  for  parsability/leamability.  Let’s  take  eiich  in  turn. 

Consider  first  her  argument  that  a  GPSG  parser  would  benefit  from  locality 
constraints  resolved  by  context  on  the  rtpht,  in  sentences  such  as  Who  tfid  you 
help  ...,  where  the  parser  must  decide  whethiT  to  insert  a  trace  after  help 
or  keep  going  so  that  the  trace  will  appear  in  some  lower  complement.  But 
once  ngiiin,  this  constraint  jtist  doesn't  matter  under  tlie  true  nondetcrministic 
model.  Advocates  of  GPSG  often  cite  the  parsing  results  for  general  context-free 
grammars  as  evidence  that  such  a  system  will  work  efficiently.  But  then,  Fodor’s 
deniiuid  for  constrmiits  on  context  become  more  mysterious.  Suppose  one  uses 
Eiirley’s  parser  for  context-free  grammars.  Tins  is  one  standard  algorilliin  on 
which  the  eiliciency  results  fur  generalised  phrase  structure  grammar  iirc  often 
ba.-:<’d.  Then  all  parses  arc  kept  in  jKmdlel,  and  there’s  no  problem  at  all:  both 
alteniatm-s  are  ciuried  along,  and  when  the  problematic  gap  appears  or  fails  to 
app<'ar,  one  of  the  possibilities  falls  by  the  wayside.  There  is  no  reason  that  the 
locality  constraint  must  exist.  The  ]>oiut  is  not  that  the  GPSG  pjirser  cannot 
be  iimde  to  benefit  from  a  locality  constraint  but  that  it  doesn't  need  to  benefit 
from  a  locality  constraint  in  the  right-context  situation.’* 

not  iniicli  i>  known  nbont  the  representatiou  of  Rcmantic  structures.  For  exntnple,  it  could 
be  that  such  NPs  can  be  iicccsscil  in  cunsr-uit  time,  up  to  o  certain  memory  limit — na  if  one 
could  instiuitly  rrnipmbcr  tbr  last  10  things  mentioned.  If  so,  then  processing  difficulties 
liuglit  not  bIkiw  up  on  short  sentences.  Like  so  many  other  details  about  proeessing,  this 
one  binges  on  representational  questions  that  we  cannot  answer  in  detail  a*  yet. 

Alternatively,  one  cemid  dispense  with  tlie  Earley  algorithm  and  cone  up  with  some  other 
parsing  algorithin  fur  these  systems.  Out  tlieii  it  remains  to  establish  that  this  alternative 


What  about  our  trnrobaae<l  i>aracr,  then?  Why  rnn’t  we  add  siinilar  par¬ 
allelism  and  thus  avoid  the  ne<‘d  for  a  loraUty  constraint?  Reiueiiibcr  that  our 
parser  design  does  not  have  roinplcx  categori(!s  such  ns  S/NP,  VP/NP,  and  so 
on;  it  can  use  just  the  unuUoyiHl  categories  providtHl  by  X  theory.  It  does  not 
use  a  hold  cell,  or  any  other  special  memory,  (liven  these  tnuispareiicy  con¬ 
straints,  it  is  iiiteri'sting  that  while  trtie  nondeteriiiinism  will  imike  a  locality 
constraint  for  riglit-disiuiihigiintiug  contexts  superfluous,  it  actiiiilly  leaves  the 
dennuid  fur  Snbjacency  unscatlnxl.  (Consider  what  happens  if  we  had  a  true 
noiideteriiiinistic,  trac<‘-l(ase<l  miiilysis  of  sentences  such  its,  What  did  Mary  lay 
...  thill  John  atef.  Note  that  the  analysis  is  completely  deterniini'd  up  to  the 
point  that  the  “gnp’^  after  eat  it  encountered.  That  is,  the  parser  is  not  car¬ 
rying  along  two  iuudyses  at  this  point,  ns  it  is  in  the  right-contejct  case.  At 
ate  the  par3<'r  tak<>s  the  noiideteriiiinistic  solution:  it  writes  out  one  parse  with 
the  trace  inserted,  and  one  with  it  not  inserted.  Dut  now  what?  The  sentence 
ends.  No  additional  information  is  forthcoming,  and  yet  there  are  still  two  vi¬ 
able  amdyses  of  the  sentence.  One  of  these  is  grammatical  (where  the  trace  is 
inserted)  and  the  other  is  not.  ambiguous.  Dut  the  sentence  is  not  intcri>rctcd 
as  having  two  analyses,  one  grammatical,  one  nut.  There  is  no  evident  way  to 
force  the  other  reading  out.  Thus,  the  nondeterministic  analysis  actually  makes 
things  worse  here:  it  yields  two  candidate  hiterpretations  when  only  one  wiU 
suffice.  To  resolve  these,  we  must  now  rescan  the  output  analysis  tree,  to  pick 
up  whether  a  wh  was  pr<»ent— adding  to  the  computational  cost.  Right-context 
won’t  help  us  here,  because  there  is  no  right-context.  But  there’s  no  evidence 
that  this  reanalysis  occurs,  or  that  such  a  sentence  is  hard  to  process.  We  con¬ 
clude  that  nondeterminism  does  nut  help  us  if  we  have  only  the  categories  S, 
NP,  VP,  etc.  and  no  Subjaccncy  ;  on  the  contrary,  it  hurts.  Thus,  Subjaceucy 
is  still  predicted  in  our  model,  luilikc  Fndor’s.  Note  that  this  is  quite  imlike 
the  right-disambiguating  context  case,  where  pursuing  alternatives  in  paraUcI 
allowi'd  us  to  hold  off  making  a  ilecision  until  information  bi'came  available. 

What  about  the  second  proposal,  about  learning?  Just  before  her  conclusion, 
Fodor  suggests  that  a  (IPSG  system  might  need  loraUty  constraints  to  make  its 
rule  system  smaller,  hence  more  easily  parsable,  mid,  as  suggested  in  the  other 
jiapcrs  where  she  has  advanced  this  proposal  (Fodor  1084)  more  learnable. 

Ill  tlie  absence  of  miy  details  about  just  how  easy  or  hard  it  is  to  parse  a 
full-scale  derived  ntle  system,  it  is  difficult  to  judge  this  proposal.  We  must 
first  emphnsixe  that  Fodor  here  is  talking  about  a  grammar  that  explicitly  lists 
possible  phrase  structure  patterns  nilc  by  rule.  This  is  rather  different  from  the 
current  GPSG  frmnework  that  represents  a  gnunmnr  via  a  set  of  doininancc  and 
precedence  statements  (ID/LP  format)  fur  Imsic  phrasal  relalioiisliips,  iiiiplica- 
tional  statements  to  encode  feature  redundancies,  mid  metarules  to  account  for 
systeiiiaticities  like  nctivi'-pa.ssive  sentences  ((lasdar,  Klein,  Pnllum,  and  Sag, 
1985).  What  one  finds  is  that  in  miy  reasonably  full-scale  grammar,  for,  say, 


parsing  method — whatever  it  ia  -i*  rfRcieut.  Fodor  does  not  offer  a  concrete  nltcmaUve. 


Eiigltfh,  the  explicit  nile  system  Ih  so  Inrge  thnt  there’s  only  marginal  gain  in 
“reiluciug”  the  sise  of  »ui  explicit  rule  system  in  tlie  immner  Fodor  suggests. 
This  is  h<yau.s<>  the  rediictioii  is  miniscule  compared  to  the  total  overaU  sixe  of 
the  nde  .systems  themselves.  Let’s  set;  why  this  is  so. 

To  h<*gin,  we  must  be  precise.  Since  Fodor  wants  to  make  an  argument  about 
improving  parsing  efliriency  by  mducing  grammar  sixe,  let  us  define  grammar 
sixe,  |f7|,  as  tlie  totttl  number  of  eymboh  in  the  grammar  aeeeaned  for  parting. 
This  is  the  stiuulard  measure.  (Sis;  Earley  10C8  for  discussion.)  We  do  not 
Wiuit  to  use  the  total  iiumher  of  individual  rulet  of  the  grammar,  because  this 
would  weight  against  rule  systems  with  “short’’  mil’s  (c.g.,  A— »DC;  B-+DEF  as 
opposi'd  to  A— »DEFC). 

I^’t  us  now  compare  the  gramtnar  sixe  of  an  explicit  phrase  structure  rale 
s>’sleni  that  allows  a  onc-S  extraction  constraint  vs.  ime  that  allows  extraction 
across  three  S’s.  Elsewheri-  (Fodor  1984),  Fodor  has  suggested  this  as  an  exam* 
pic  of  the  benefits  of  constraints:  the  tighter  the  constraints  on  extraction,  the 
fewer  the  rules.  While  this  is  literally  true,  the  problem  is  that  sucli  a  gram¬ 
mar  is  already  so  large  that  any  minor  effect  imposed  by  one  new  constraint  is 
swamped  out. 

It  is  of  course  quite  difficidt  to  know  what  the  “true”  grammar  sixe  for 
such  a  system  is,  because  we  do  not  know  what  the  “true”  grammar  of  any 
natural  language  is,  even  of  English.  However,  we  can  say  this  much:  any  such 
explicit  rule  system  must  have  a  rale  for  every  possible  8urfiu;c  phrase  structure 
pattern.  How  many  such  pattcras  arc  there?  Perhaps  the  most  systematic  study 
of  such  patterns  has  been  carried  out  in  the  context  of  Sager’s  work  (1981). 
For  instance,  Hobbs  (1974)  estimates  tliat  a  subpart  of  the  Sager  grammar, 
when  expanded  out  into  a  context-free  form,  would  be  “about  several  orders 
of  magnitude  larger”  than  the  290  productions  and  300  context  restrictions  it 
contains  in  context-sensitive  form  (1974:132).  That  is,  the  expanded  grammar 
sixe  would  he  have  about  20,000  00,000  context-free  rales  We  take  this  as  a 
fairly  con-servative  estimate  of  the  number  of  explicit,  rule-by-rale  descriptions 
of  phrase  structure  patterns  in  English.’* 

The  Earley  algorithm  runs  in  time  at  most  |(?|’n*,  where  n  is  the  sentence 
length  hi  tokens.  That  is,  using  the  Earley  algorithm  with  a  fully-expanded, 

^‘Thc  iuitiid  graiiiinor's  prodnetiona  nre  in  Cbomaky  noruial  form,  nnif  therefore  have  a  aiia  of 
3  per  production.  Thna  Uie  initial  grammar  siac  ia  about  0(H),  with  300  context  matrictiona. 

^‘Notc  that  moat  graniinatical  dcacriptiona  that  appear  in  the  computational  literature  in 
fart  ilracribv  only  ainall  fragments  of  natural  longimgra — quite  rcaaonably,  since  they  are 
often  designed  to  illnatrate  one  or  lUiothcr  theoretical  pinut,  or  work  within  a  aubtangnage 
that  wTvea  sonic  functional  cud  (like  datoliaae  retrieval);  they  ore  not  designed  for  broad 
roverage.  For  instance,  the  example  GPSG  ayatnn  dcscrilied  by  Gawron,  King,  Lamping, 
LoebntT,  Paulson,  Pulluin,  Sag,  and  Wasow,  1082  for  database  retrieval  has  an  expanded 
grammar  sixe  of  almiit  I5<HI  18(H)  (1082:77),  hut  does  not  inchide  many  srutence  types  and 
restrictions  of  the  Soger  griuniiior.  For  instance,  a}>piMitivcs  and  sentence  adjuncts  of  numy 
dilferent  ty]>es  are  not  included  (liUie  M  the  hum  that . . .;  Whatever  ymi  M|i;  ike  yny,  the  «erf 
fame  pcr$on  you  fa«  yeHerday,  u  . . .). 


explicit  rule  syatcin  for  English,  the  numiiig  time  would  be  at  worst  1.6  X 
10** n^,  or  about  a  bilUon  xn^.  The  result  is  that  any  rliangc  brought  about  by 
introducing  a  constraint  on  extraction  across  one  S  rather  tlnui,  say,  three,  is 
irrrl<<vant.  The  base  graiiiinar  with  tlircc-S  extraction  will  need  two  or  three 
extra  nontcmiinal  symbols,  in  order  to  “count’'  how  many  S's  Imvc  be«!n  crossed 
(Sx,  S],  S]).  Suppose  this  adds  50  new  rules.  What  hap]>eiis  to  parsing  time? 
It  is  “exploded”  from  1.5  billion  n*  to  2.4  billion  an  increase,  to  be  sure, 
but  one  that  cannot  possibly  matter,  because  the  constant  factor  is  already  so 
large. 

We  do  not  mcim  to  take  this  as  a  serious  calculation;  it  Ls  quite  spiYidative. 
However,  the  qualititative  point  still  stands.  Tliis  exercise  is  simply  designed 
to  demonstrate  that  an  explicit  rule  system  doesn’t  exhibit  the  right  kind  of 
demarcation  between  one  and  more  than  one  that  is  so  characteristic  of  natural 
languages.  Details  about  grammar  sixe  aside,  if  extraction  across  two  domains 
docs  not  Iciul  to  a  jirocessing  burden,  tlieii  it  is  hard  to  say  why  three  rather  than 
four  nr  five  domains  docs.  Any  system  grounded  on  explicit  phrase  structure 
ndes  does  not  naturally  distinguish  between  a  locality  condition  that  acts  over, 
say,  three  domains  and  one  that  acts  over  a  single  domain.  We  just  saw  that  there 
could  be  no  relevant  difference  for  parsing,  or  fur  learning  (if  we  equate  sisc  of 
rule  system  with  difficulty  of  Icaniiug).  But  we  suspect  thsxt  this  simply  misses 
an  important  property  of  natural  grammars:  namely,  that  they  do  not  have 
“counting”  predicates  that  distinguish  between  two  or  thr««,  or  17  domains. 
This  is  evidently  a  property  of  grammars  generally,  and  has  some  power  in 
explaining  the  metrical  structure  of  phonological  rule  systems  (see  Halle  and 
Vergnaud  forthconimg  1985).  But  whp  do  grammars  have  this  property?  If  we 
assume  tliat  nile  systems  me  written  in  a  derived  fashion,  as  Fudor  insists,  then 
tlierc  is  no  rciison  for  it.  A  grammar  that  counts  to  16  is  just  as  easily  parsed 
and  just  as  easily  lemned  iis  one  that  docs  not. 

Suppose,  in  contrast,  that  there  arc  no  phrase  structure  ndes— no  explicit 
derived  rules  at  idl.  Instead,  suppose  that  there  are  just  individual  lexical  items 
and  their  feature  projections  (as  dehned  by  X  theory),  plus  the  movement  ndes 
ami  constraints  deRned  by  (IB  theory.  Now  there  caimot  be  any  rule  of  grammar 
that  cuts  across  just  three  S  domains.  ludivkliiid  lexical  items  can  subcategorize 
for  single  S’s,  and  hence  build  phrases  consisting  of  adjacent  S  domains.  Since 
movement  ran  ajiply,  we  can  move  elements  across  these  domains.  Cyclicity 
(itxTatioii  of  this  process)  h'ads  to  supxThcially  unbounded  nioveineut.  Bvt  no 
other  conetrainte  enn  even  be  Hated.  The  vocabulary  for  writing  down  grammars 
cannot  refer  to  phrase  structure  rules,  and  so  cannot  write  down  a  chxiin  of 
three  S  cxpan.sions  to  (dlow  extr.iction  across  three  S's  but  not  four.  As  we 
ob.served  in  our  bxiok,  either  free  (unbouiidexl)  movement  is  possible,  or  else 
movement  across  a  single  category  is  blocked;  nothing  in  between  is  allowed. 
This  result  -  the  noncoimting  evidently  true  of  natural  grammars  -/olioies  from 


the  noncxiHtenre  of  derived  phrase  structure  nilcs.** 

Of  course,  uoudctcnniniaiii  mid  the  flexibility  allowed  in  writing  derived 
grammars  leavis  open  mmiy  possibilities.  As  we  have  seen,  tliis  is  exactly  what 
is  wrong  with  a  weak  act  of  hypotheses:  it  leaves  open  too  many  avenues  to 
explore.  As  we  said  at  the  outset,  we  prefer  to  tackle  the  problem  head  on, 
by  mlopting  strong  constraints  that  leml  to  interesting  predictions  mid  racplar 
nations  of  why  natural  grmiimars  me  built  the  way  they  are,  giving  up  those 
constraints  only  when  absolutely  u(*ccssary.  So  far,  we’ve  been  encouraged  by 
the  results.  Our  predictions  about  locality  principles,  suitably  revised,  hold  up. 
Our  modular  design  leads  to  testable  hypotheses  about  the  role  of  c-command 
in  Imiguage  processing,  now  being  probtsl  (Weinberg  mid  Garrett,  forthcom¬ 
ing).  Our  transparency  assumption  leads  to  noncountiug  grammars.  We  see  no 
reason  to  abmidon  the  chase  now,  when  we  have  come  so  far. 


far  ns  we  can  tell,  this  property  also  hnlils  in  current  GPSG  tr.intewnrks  that  nvoid 
explicit  phrase  structure  nilvs  and  use  subcategorisatiun  and  ID/LP  stateiiicuts  instead 
to  define  a  set  of  admissible  phrase  stnicturcs.  Thus  tliis  version  of  GPSG  also  olieys 
uoucounting. 
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